alamb commented on issue #8668:
URL: https://github.com/apache/arrow-rs/issues/8668#issuecomment-3427135463

   Thank you @friendlymatthew  and @adriangb 
   
   What I would suggest for this ticket is starting out by writing a test / 
usecase and then figuring out what API you need.
   
   For example try to write a Parquet reader that 
   1. takes an ObjectStore instance as input
   2. tries to prefetch pages for the next row group as it is decoding the 
current row group. 
   
   I am not sure exactly how this would work -- some ideas:
   1. You could implement a peek API and start a separate task for prefetch
   2. Maybe we could have a function like `PushDecoder::take_next_row_group()` 
that would return a new PushDecoder for evaluating only a single row group 
which could be processed in parallel?
   
   The more I write this the more I like the PushDecoder::take_next_row_group 
idea...
   
   Something like
   ```rust
   let decoder = ParquetPushDecoder::new(...);
   // start decoding row groups in parallel:
   let rg1_decoder = decoder.take_next_row_group()
   // now fetch / decode data from rg1 in a thread
   task::run(... rg1_decocder ...);
   let rg2_decoder  =decoder.take_next_row_group();
   // read data from rg2 ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to