AntoinePrv commented on issue #47112: URL: https://github.com/apache/arrow/issues/47112#issuecomment-3118916095
I'm working on this. > This issue is to refactor the RleDecoder into a parser+decoder. The parser would be a pure event-driven facility to decompose a RLE stream into its individual tokens. So while this is not so hard to (re)implement with the existing code, the main difficulty lies with the inversion of the control over the decoding. Today, callers call `RleDecoder::GetBatch` (and the likes) to pull `n_values` regardless of how they are encoded. This logic goes up the inheritance tree up to the `TypedDecoder`. With an event driven API, callers would pull either type of run, but would need some bookkeeping to manage cases where the `n_values` they are asked to pull will fall in the middle of a run. This logic need to be repeated for each caller. > However, once we have a RLE parser, we can then avoid the decoder in some situations. Since a large part of the work will go into plugging the events into the callers, maybe we should take this more into considerations when designing this API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org