jaylmiller commented on PR #5292:
URL:
https://github.com/apache/arrow-datafusion/pull/5292#issuecomment-1468327766
Coding-wise everything is finished and code is ready to review. But in terms
of bench results, I'm not 100% confident yet.
Sort micro-benchmarks are looking pretty good: significant improvements on
cases where row encoding is actually used, minor regressions--mostly within
error bars--on cases without row encoding but of course more experienced
contributors would know better about how significant these regressions actually
are (I'll repost them at the bottom):
```
group main-sort
rows-sort
----- ---------
---------
sort f64 1.00
10.8±0.23ms ? ?/sec 1.04 11.2±0.93ms ? ?/sec
sort f64 preserve partitioning 1.00
4.0±0.27ms ? ?/sec 1.04 4.1±0.28ms ? ?/sec
sort i64 1.00
9.5±0.55ms ? ?/sec 1.09 10.3±0.74ms ? ?/sec
sort i64 preserve partitioning 1.00
3.3±0.10ms ? ?/sec 1.06 3.5±0.13ms ? ?/sec
sort mixed tuple 1.28
28.3±3.35ms ? ?/sec 1.00 22.2±1.60ms ? ?/sec
sort mixed tuple preserve partitioning 1.00
3.6±0.17ms ? ?/sec 1.15 4.1±1.09ms ? ?/sec
sort mixed utf8 dictionary tuple 2.84
52.7±8.27ms ? ?/sec 1.00 18.6±1.29ms ? ?/sec
sort mixed utf8 dictionary tuple preserve partitioning 1.02
4.2±0.92ms ? ?/sec 1.00 4.1±0.55ms ? ?/sec
sort utf8 dictionary 1.00
3.7±0.21ms ? ?/sec 1.04 3.9±0.33ms ? ?/sec
sort utf8 dictionary preserve partitioning 1.00
1487.2±1444.67µs ? ?/sec 1.01 1502.8±315.79µs ? ?/sec
sort utf8 dictionary tuple 3.26
57.0±11.35ms ? ?/sec 1.00 17.5±2.08ms ? ?/sec
sort utf8 dictionary tuple preserve partitioning 1.13
4.1±1.08ms ? ?/sec 1.00 3.6±0.52ms ? ?/sec
sort utf8 high cardinality 1.01
28.0±3.70ms ? ?/sec 1.00 27.6±3.81ms ? ?/sec
sort utf8 high cardinality preserve partitioning 1.00
11.1±1.48ms ? ?/sec 1.21 13.5±3.38ms ? ?/sec
sort utf8 low cardinality 1.00
15.3±5.08ms ? ?/sec 1.10 16.9±6.20ms ? ?/sec
sort utf8 low cardinality preserve partitioning 1.03
8.1±2.21ms ? ?/sec 1.00 7.8±1.75ms ? ?/sec
sort utf8 tuple 1.96
56.8±8.36ms ? ?/sec 1.00 29.0±4.82ms ? ?/sec
sort utf8 tuple preserve partitioning 1.02
6.7±0.95ms ? ?/sec 1.00 6.5±0.46ms ? ?/sec
```
In summary, I'd like to get an opinion on these micro bench results. And
then also ideally, we can run the e2e bench comparisons (#5561) on `tpch` and
`parquet` and get a bit more data on whether this change is worth merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]