Ted-Jiang opened a new pull request, #1977:
URL: https://github.com/apache/arrow-rs/pull/1977
# Which issue does this PR close?
Closes #1976.
# Rationale for this change
Part support #1792
if we use page index get row ranges like below, get `row_ranges`
``` rust
//filter `x < 11`
let filter =
|page: &PageIndex<i32>| page.max.as_ref().map(|&x| x <
11).unwrap_or(false);
let mask = index.indexes.iter().map(filter).collect::<Vec<_>>();
let row_ranges = compute_row_ranges(&mask, locations,
total_rows).unwrap();
```
we can pass the `row_ranges` to new API to read parquet file(datafusion use
this way but without `row_ranges`)
```
fn get_record_reader_by_columns_and_row_ranges(
&mut self,
mask: ProjectionMask,
row_ranges: &RowRanges,
batch_size: usize,
) -> Result<ParquetRecordBatchReader> {
```
# What changes are included in this PR?
<!---
There is no need to duplicate the description in the issue here but it is
sometimes worth providing a summary of the individual changes in this PR.
-->
One example: if we read col1, col2 and apply filter get the result we need
read `row_ranges[20, 80]`,
_For col1:_
we need all data from page1, page2, page3.
_For col2:_
after this PR, we will **filter** page2 and keep page0, page1
as for page1: need all data
as for page0: we need part of its row_range(need row align **TODO**)
```
* rows col1 col2 col3
* ┌──────┬──────┬──────┐
* 0 │ p0 │ │ │
* ╞══════╡ p0 │ p0 │
* 20 │ p1(X)│------│------│
* ╞══════╪══════╡ │
* 40 │ p2(X)│ │------│
* ╞══════╡ p1(X)╞══════╡
* 60 │ p3(X)│ │------│
* ╞══════╪══════╡ │
* 80 │ p4 │ │ p1 │
* ╞══════╡ p2 │ │
* 100 │ p5 │ │ │
* └──────┴──────┴──────┘
*
```
# Are there any user-facing changes?
<!---
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking
change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]