tustvold commented on PR #4524: URL: https://github.com/apache/arrow-rs/pull/4524#issuecomment-1644214328
> I do wonder if fixed width formats would be useful for multi-column equality comparisons Do we have any benchmarks that are grouping by multiple primitive columns? Whilst I appreciate that optimising for benchmarks is a form of observability bias, it can be a useful way to focus our efforts where they will have the most impact? > using native types for single columns should be significantly faster than any row format My initial experiments have ``` -------------------- Benchmark tpch_mem.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ specialize-primitive-group-values ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 385.21ms │ 387.08ms │ no change │ │ QQuery 2 │ 102.04ms │ 84.20ms │ +1.21x faster │ │ QQuery 3 │ 104.50ms │ 103.45ms │ no change │ │ QQuery 4 │ 69.37ms │ 71.72ms │ no change │ │ QQuery 5 │ 234.38ms │ 236.49ms │ no change │ │ QQuery 6 │ 28.62ms │ 28.05ms │ no change │ │ QQuery 7 │ 550.32ms │ 583.28ms │ 1.06x slower │ │ QQuery 8 │ 160.42ms │ 152.87ms │ no change │ │ QQuery 9 │ 348.10ms │ 345.23ms │ no change │ │ QQuery 10 │ 201.04ms │ 204.01ms │ no change │ │ QQuery 11 │ 95.55ms │ 94.25ms │ no change │ │ QQuery 12 │ 112.20ms │ 110.78ms │ no change │ │ QQuery 13 │ 199.70ms │ 156.56ms │ +1.28x faster │ │ QQuery 14 │ 30.56ms │ 32.08ms │ no change │ │ QQuery 15 │ 34.63ms │ 31.02ms │ +1.12x faster │ │ QQuery 16 │ 104.27ms │ 105.36ms │ no change │ │ QQuery 17 │ 582.36ms │ 483.59ms │ +1.20x faster │ │ QQuery 18 │ 994.82ms │ 906.17ms │ +1.10x faster │ │ QQuery 19 │ 112.26ms │ 109.24ms │ no change │ │ QQuery 20 │ 204.93ms │ 218.78ms │ 1.07x slower │ │ QQuery 21 │ 695.60ms │ 687.18ms │ no change │ │ QQuery 22 │ 55.44ms │ 55.00ms │ no change │ └──────────────┴──────────┴───────────────────────────────────┴───────────────┘ ``` > there's around 30% performance gain compared to the main branch I have not been able to reproduce these results using EBAY-KYLIN-4003-5 ``` Comparing main and EBAY-KYLIN-4003-5 -------------------- Benchmark tpch_mem.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ EBAY-KYLIN-4003-5 ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 385.21ms │ 379.92ms │ no change │ │ QQuery 2 │ 102.04ms │ 103.16ms │ no change │ │ QQuery 3 │ 104.50ms │ 110.41ms │ 1.06x slower │ │ QQuery 4 │ 69.37ms │ 74.71ms │ 1.08x slower │ │ QQuery 5 │ 234.38ms │ 249.58ms │ 1.06x slower │ │ QQuery 6 │ 28.62ms │ 28.92ms │ no change │ │ QQuery 7 │ 550.32ms │ 589.97ms │ 1.07x slower │ │ QQuery 8 │ 160.42ms │ 157.83ms │ no change │ │ QQuery 9 │ 348.10ms │ 364.37ms │ no change │ │ QQuery 10 │ 201.04ms │ 204.18ms │ no change │ │ QQuery 11 │ 95.55ms │ 103.85ms │ 1.09x slower │ │ QQuery 12 │ 112.20ms │ 113.52ms │ no change │ │ QQuery 13 │ 199.70ms │ 178.66ms │ +1.12x faster │ │ QQuery 14 │ 30.56ms │ 31.45ms │ no change │ │ QQuery 15 │ 34.63ms │ 34.25ms │ no change │ │ QQuery 16 │ 104.27ms │ 106.39ms │ no change │ │ QQuery 17 │ 582.36ms │ 520.88ms │ +1.12x faster │ │ QQuery 18 │ 994.82ms │ 1030.34ms │ no change │ │ QQuery 19 │ 112.26ms │ 113.70ms │ no change │ │ QQuery 20 │ 204.93ms │ 193.52ms │ +1.06x faster │ │ QQuery 21 │ 695.60ms │ 670.74ms │ no change │ │ QQuery 22 │ 55.44ms │ 55.78ms │ no change │ └──────────────┴──────────┴───────────────────┴───────────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
