AntoinePrv commented on issue #47112:
URL: https://github.com/apache/arrow/issues/47112#issuecomment-3118916095

   I'm working on this.
   
   > This issue is to refactor the RleDecoder into a parser+decoder. The parser 
would be a pure event-driven facility to decompose a RLE stream into its 
individual tokens.
   
   So while this is not so hard to (re)implement with the existing code, the 
main difficulty lies with the inversion of the control over the decoding.
   
   Today, callers call `RleDecoder::GetBatch` (and the likes) to pull 
`n_values` regardless of how they are encoded.
   This logic goes up the inheritance tree up to the `TypedDecoder`.
   
   With an event driven API, callers would pull either type of run, but would 
need some bookkeeping to manage cases where the `n_values` they are asked to 
pull will fall in the middle of a run. This logic need to be repeated for each 
caller.
   
   > However, once we have a RLE parser, we can then avoid the decoder in some 
situations.
   
   Since a large part of the work will go into plugging the events into the 
callers, maybe we should take this more into considerations when designing this 
API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to