HippoBaro commented on PR #9831: URL: https://github.com/apache/arrow-rs/pull/9831#issuecomment-4385881175
Thank you @etseidl and @alamb for pushing back! The regression was fairly straightforward: the compact level representation added extra branching/dispatch on a hot path. At first I thought this was the price to pay for speeding up the `Uniform` and `Absent` level representations. This was particularly expensive for list columns, because each non-empty list row called back into child level generation, reaching `write_leaf` for primitive children. It turns out that I hadn't considered a good opportunity to batch writes there as well. We now batch consecutive non-empty list rows into a single child level write, then walk the appended repetition levels backwards to mark list-row boundaries. On my laptop these benchmarks show low single-digit improvements for the `list_primitive` cases, but your mileage may vary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
