Ted-Jiang opened a new pull request, #1977:
URL: https://github.com/apache/arrow-rs/pull/1977

   # Which issue does this PR close?
   
   
   Closes #1976.
   
   # Rationale for this change
    Part support #1792 
   
   if we use page index get row ranges like below, get `row_ranges`
   ``` rust
           //filter `x < 11`
           let filter =
               |page: &PageIndex<i32>| page.max.as_ref().map(|&x| x < 
11).unwrap_or(false);
   
           let mask = index.indexes.iter().map(filter).collect::<Vec<_>>();
   
           let row_ranges = compute_row_ranges(&mask, locations, 
total_rows).unwrap();
   ```
   we can pass the `row_ranges` to new API  to read parquet file(datafusion use 
this way but without `row_ranges`)
   ```
   fn get_record_reader_by_columns_and_row_ranges(
           &mut self,
           mask: ProjectionMask,
           row_ranges: &RowRanges,
           batch_size: usize,
       ) -> Result<ParquetRecordBatchReader> {
   ```
   
   # What changes are included in this PR?
   
   <!---
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   One example: if we read  col1, col2  and apply filter get the result we need 
read `row_ranges[20, 80]`,
   _For col1:_ 
     we need all data from page1, page2, page3. 
   _For col2:_
    after this PR, we will **filter**  page2 and keep page0, page1
        as for page1: need all data
        as for page0: we need part of its row_range(need row align **TODO**)
   ```
    * rows   col1   col2   col3
    *      ┌──────┬──────┬──────┐
    *   0  │  p0  │      │      │
    *      ╞══════╡  p0  │  p0  │
    *  20  │ p1(X)│------│------│
    *      ╞══════╪══════╡      │
    *  40  │ p2(X)│      │------│
    *      ╞══════╡ p1(X)╞══════╡
    *  60  │ p3(X)│      │------│
    *      ╞══════╪══════╡      │
    *  80  │  p4  │      │  p1  │
    *      ╞══════╡  p2  │      │
    * 100  │  p5  │      │      │
    *      └──────┴──────┴──────┘
    * 
   ```
   
   # Are there any user-facing changes?
   
   
   <!---
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to