gruuya commented on PR #7180: URL: https://github.com/apache/arrow-datafusion/pull/7180#issuecomment-1668573152
Ok, I went back and re-visited [a change](https://github.com/apache/arrow-datafusion/pull/7180#issuecomment-1664209600) I made previously, and it looks like it was definitely contributing to consistent systemic regression I initially observed. I also think I found a better way to solve that issue (basically convert all the batches from a single spill into individual streams) that I pushed now. This still results in the optimized memory profiles depicted in the PR description. Here are the new sorting benchmarks (10 iterations, for fetch in `None, Some(1), Some(10), Some(100), Some(1000), Some(8000), Some(10000)`): <details><summary>sorting benchmarks</summary> ```text ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ top-k-eager-sorting ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ Qsort utf8 │ 37157.19ms │ 37730.09ms │ no change │ │ fetch None │ │ │ │ │ Qsort int │ 48290.35ms │ 49269.24ms │ no change │ │ fetch None │ │ │ │ │ Qsort │ 38306.38ms │ 38645.93ms │ no change │ │ decimal │ │ │ │ │ fetch None │ │ │ │ │ Qsort │ 53457.21ms │ 54205.26ms │ no change │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ None │ │ │ │ │ Qsort utf8 │ 38104.45ms │ 38006.01ms │ no change │ │ tuple fetch │ │ │ │ │ None │ │ │ │ │ Qsort mixed │ 43931.69ms │ 43924.41ms │ no change │ │ tuple fetch │ │ │ │ │ None │ │ │ │ │ Qsort utf8 │ 1869.92ms │ 1899.74ms │ no change │ │ fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort int │ 2011.87ms │ 1821.10ms │ +1.10x faster │ │ fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort │ 1851.76ms │ 1760.68ms │ no change │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort │ 1914.27ms │ 1953.97ms │ no change │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort utf8 │ 2138.20ms │ 2417.24ms │ 1.13x slower │ │ tuple fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort mixed │ 1999.94ms │ 2128.49ms │ 1.06x slower │ │ tuple fetch │ │ │ │ │ Some(1) │ │ │ │ │ Qsort utf8 │ 1904.48ms │ 1906.43ms │ no change │ │ fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort int │ 2025.38ms │ 1833.70ms │ +1.10x faster │ │ fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort │ 1837.30ms │ 1787.43ms │ no change │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort │ 1905.57ms │ 1962.79ms │ no change │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort utf8 │ 2125.98ms │ 2436.01ms │ 1.15x slower │ │ tuple fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort mixed │ 2010.12ms │ 2144.80ms │ 1.07x slower │ │ tuple fetch │ │ │ │ │ Some(10) │ │ │ │ │ Qsort utf8 │ 1910.96ms │ 1961.26ms │ no change │ │ fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort int │ 2033.24ms │ 1869.98ms │ +1.09x faster │ │ fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort │ 1845.95ms │ 1782.47ms │ no change │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort │ 1962.15ms │ 1986.78ms │ no change │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort utf8 │ 2157.68ms │ 2467.28ms │ 1.14x slower │ │ tuple fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort mixed │ 2046.18ms │ 2199.65ms │ 1.08x slower │ │ tuple fetch │ │ │ │ │ Some(100) │ │ │ │ │ Qsort utf8 │ 1983.56ms │ 2035.27ms │ no change │ │ fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort int │ 2151.34ms │ 1995.38ms │ +1.08x faster │ │ fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort │ 1899.04ms │ 1855.03ms │ no change │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort │ 2103.93ms │ 2243.71ms │ 1.07x slower │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort utf8 │ 2274.01ms │ 2662.95ms │ 1.17x slower │ │ tuple fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort mixed │ 2224.79ms │ 2587.77ms │ 1.16x slower │ │ tuple fetch │ │ │ │ │ Some(1000) │ │ │ │ │ Qsort utf8 │ 2176.51ms │ 2745.07ms │ 1.26x slower │ │ fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort int │ 2300.06ms │ 2695.76ms │ 1.17x slower │ │ fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort │ 1990.73ms │ 2343.61ms │ 1.18x slower │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort │ 2758.47ms │ 4018.36ms │ 1.46x slower │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort utf8 │ 2903.87ms │ 4070.53ms │ 1.40x slower │ │ tuple fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort mixed │ 3540.07ms │ 5640.86ms │ 1.59x slower │ │ tuple fetch │ │ │ │ │ Some(8000) │ │ │ │ │ Qsort utf8 │ 2255.55ms │ 2181.42ms │ no change │ │ fetch │ │ │ │ │ Some(10000) │ │ │ │ │ Qsort int │ 2380.78ms │ 2367.10ms │ no change │ │ fetch │ │ │ │ │ Some(10000) │ │ │ │ │ Qsort │ 2108.30ms │ 2022.66ms │ no change │ │ decimal │ │ │ │ │ fetch │ │ │ │ │ Some(10000) │ │ │ │ │ Qsort │ 2817.34ms │ 2758.43ms │ no change │ │ integer │ │ │ │ │ tuple fetch │ │ │ │ │ Some(10000) │ │ │ │ │ Qsort utf8 │ 2625.46ms │ 2564.71ms │ no change │ │ tuple fetch │ │ │ │ │ Some(10000) │ │ │ │ │ Qsort mixed │ 3457.03ms │ 3418.22ms │ no change │ │ tuple fetch │ │ │ │ │ Some(10000) │ │ │ │ └──────────────┴────────────┴─────────────────────┴───────────────┘ ``` </details> Note that the base cases now show no changes compared to main, which is what I expect. There are a couple of speedups, but still the overall direction is regression-y. This is especially pronounced for K = 8000 (good idea for including that value @yjshen). It could be that there are some further micro-optimizations waiting that would push the runtimes for this branch to merge-acceptable territory while keeping the memory usage as is, though I doubt it. I'll give it some more thought and report back if I come up with something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
