mapleFU opened a new issue, #10079: URL: https://github.com/apache/arrow-rs/issues/10079
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Previously I wrote https://github.com/apache/arrow-rs/pull/10037 . This optimize list type writing when it's last level. However for types like `list<struct<a: int, b: f32, c:list<...>>`, the writes would not optimized. We should thinking a algorithm to optimize it. **Describe the solution you'd like** Here I'd like to introducing a "batch" algorithm, this is a bit more complex. It's purpose it's batching the `write` call and rep-level back-filling. 1. get self's max_rep_level for list, as `list_max_rep_level` 2. when write [start, end) for child 1. If its max_rep_level is equal to parent's `list_max_rep_level + 1`, do as https://github.com/apache/arrow-rs/pull/10037 , which sets rep-levels at offsets, it's O(list-length) call 2. Otherwise, it's larger than `list_max_rep_level + 1`. Then our target it's to find the list start of currently and mark them. For 2.2, we have list lengths, and we can batching find the list length in childs equal to write list length. For example: `[ [2], [3], [4, 5], [6, 7, 8]]`, the lengths is `1, 1, 2, 3`, we should find the rep of child list reaches `1, 1, 2, 3`, and mark the list start to level. This algorithm didn't reduce the work, it just reduce the cost of number of write calls. **Describe alternatives you've considered** This algorithm introducing a backward scan in every write. Maybe we can mark the `[lengths]` or how to get the lengths in writer. **Additional context** no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
