alamb opened a new issue, #9929:
URL: https://github.com/apache/arrow-datafusion/issues/9929

   ### Is your feature request related to a problem or challenge?
   
   We are building / testing a specialized index for data stored in parquet 
that can tell us what row offsets are needed from the parquet file based on 
additional infomration 
   
   Currently the parquet-rs parquet reader allows specifying this type of 
information via 
[`ArrowReaderBuilder::with_row_selection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html#method.with_row_selection)
   
   However, the DataFusion 
[`ParquetExec`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html)
 has no way to pass this information down. It does build its own 
   
   
   
   ### Describe the solution you'd like
   
   What I would like is a way to provide something like a 
[`RowSelection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)
 for each row group 
   
   
   
   
   
   ### Describe alternatives you've considered
   
   Here is one possible API:
   
   ```rust
   let parquet_selection = ParquetSelection::new()
     // * rows 100-250 from row group 1
     .select(1, RowSelection::from(vec![
       RowSelector::skip(100),
       RowSelector::select(150)
     ]);
     // * rows 50-100 and 200-300 in row group 2
     .select(2, RowSelection::from(vec![
       RowSelector::skip(50),
       RowSelector::select(50),
       RowSelector::skip(100),
       RowSelector::select(100),
     ]);
   
   let parquet_exec = ParquetExec::new(...)
     .with_selection(parquet_selection);
   ```
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to