jonkeane commented on pull request #9615:
URL: https://github.com/apache/arrow/pull/9615#issuecomment-833767798
Ok, Finally got these benchmarks re-run and this report put together.
TL;DR:
For multi-core operation:
* Dict types are massively faster
* Smaller improvements are seen on most other types (for types that we have
all-one-type benchmark fixtures for): integers, floats
* Strings are either the same as or _slightly_ slower
* The naturalistic datasets we have are a mixture:
* nyctaxi is faster (especially on the first iteration)
* fannie + chicago traffic are slightly longer (possibly because of more
strings?)
For single-core operation:
Most datasets/types have very similar performance across the branches (dicts
are the only ones that stand out as seeing a decent speed up, but nowhere near
what we see on the 8-core test)
Here's a zip* of the report
[parallel-data-conversion.html.zip](https://github.com/apache/arrow/files/6436928/parallel-data-conversion.html.zip)
* – to get around GH file-extension restrictions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]