mapleFU opened a new issue, #10079:
URL: https://github.com/apache/arrow-rs/issues/10079

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Previously I wrote https://github.com/apache/arrow-rs/pull/10037 . This 
optimize list type writing when it's last level. However for types like 
`list<struct<a: int, b: f32, c:list<...>>`, the writes would not optimized. We 
should thinking a algorithm to optimize it.
   
   **Describe the solution you'd like**
   
   Here I'd like to introducing a "batch" algorithm, this is a bit more 
complex. It's purpose it's batching the `write` call and rep-level 
back-filling. 
   
   1. get self's max_rep_level for list, as `list_max_rep_level`
   2. when write [start, end) for child
     1. If its max_rep_level is equal to parent's `list_max_rep_level + 1`, do 
as https://github.com/apache/arrow-rs/pull/10037 , which sets rep-levels at 
offsets, it's O(list-length) call
     2. Otherwise, it's larger than `list_max_rep_level + 1`. Then our target 
it's to find the list start of currently and mark them.
   
   For 2.2, we have list lengths, and we can batching find the list length in 
childs equal to write list length. For example: `[ [2], [3], [4, 5], [6, 7, 
8]]`, the lengths is `1, 1, 2, 3`, we should find the rep of child list reaches 
`1, 1, 2, 3`, and mark the list start to level.
   
   This algorithm didn't reduce the work, it just reduce the cost of number of 
write calls.
   
   **Describe alternatives you've considered**
   
   This algorithm introducing a backward scan in every write. Maybe we can mark 
the `[lengths]` or how to get the lengths in writer.
   
   **Additional context**
   
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to