[
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen closed HUDI-3639.
----------------------------
Resolution: Fixed
Fixed via master branch: 5d196fe61757987af29b38e1b5cf38d7ca001924
> [Incremental] Add Proper Incremental Records FIltering support into Hudi's
> custom RDD
> -------------------------------------------------------------------------------------
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Alexey Kudinkin
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on
> `ParquetFileReader` to do record-level filtering of the records that don't
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
> (since using one would prevent records from being filtered by the Reader)
>
> Instead, we should make sure that proper record-level filtering is performed
> w/in the returned RDD, instead of squarely relying on FileReader to do that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)