[GitHub] [arrow] jonkeane commented on pull request #9615: ARROW-3316: [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

GitBox Thu, 06 May 2021 11:41:50 -0700


jonkeane commented on pull request #9615:
URL: https://github.com/apache/arrow/pull/9615#issuecomment-833767798



   Ok, Finally got these benchmarks re-run and this report put together.
   
   TL;DR:
   
   For multi-core operation:
   * Dict types are massively faster
   * Smaller improvements are seen on most other types (for types that we have 
all-one-type benchmark fixtures for): integers, floats
   * Strings are either the same as or _slightly_ slower
   * The naturalistic datasets we have are a mixture:
     * nyctaxi is faster (especially on the first iteration)
     * fannie + chicago traffic are slightly longer (possibly because of more 
strings?) 
     
   For single-core operation:
   Most datasets/types have very similar performance across the branches (dicts 
are the only ones that stand out as seeing a decent speed up, but nowhere near 
what we see on the 8-core test)
   
   Here's a zip* of the report
   
[parallel-data-conversion.html.zip](https://github.com/apache/arrow/files/6436928/parallel-data-conversion.html.zip)
   
   * –  to get around GH file-extension restrictions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on pull request #9615: ARROW-3316: [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

Reply via email to