alamb commented on PR #8159: URL: https://github.com/apache/arrow-rs/pull/8159#issuecomment-3201455557
Status report: Rewriting the async decoder to use the push decoder went well (though this is not overly surprising given that the push decoder state machine was mostly modeled on the async record batch reader I found a few items to address, but no show stoppers. Pretty much all the tests pass except 1. A few that are unit tests of the old APIs 2. `async_reader_with_next_row_groups` (this is doable, it just needs another hook into the push decoder) Things to do: 1. Rewrite the inner async reader tests to not use the inner reader state (move to IO) -- no ArrowReaderBuilder 4. Implement async_reader_with_next_row_groups I also found a few things that would be very nice to fix in the push decoder in general: 1. Box the ParquetDecoderSstate inner state of the decoder (to make moving it around faster) 3. remove the file_len = 0 from the push decoder builder (the async reader does not know the length of the file and it does not need to) I updated the description of https://github.com/apache/arrow-rs/pull/7997 to reflect these items and will work on them now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
