[
https://issues.apache.org/jira/browse/HUDI-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas Jiang updated HUDI-6317:
---------------------------------
Description: At present, the default value of
read.streaming.skip_clustering is false, which could cause the situation that
streaming reading reads the replaced file slices of clustering, so that
streaming reading may read T-1 day data when clustering the data of T-1 day to
cause duplicated data. Therefore streaming read should skip clustering instants
for all cases to avoid reading the replaced file slices. Same to
`read.streaming.skip_compaction`. (was: At present, the default value of
read.streaming.skip_clustering is false, which could cause the situation that
streaming reading reads the replaced file slices of clustering, so that
streaming reading may read T-1 day data when clustering the data of T-1 day to
cause duplicated data. Therefore streaming read should skip clustering instants
for all cases to avoid reading the replaced file slices. The same to
`read.streaming.skip_compaction`.)
> Streaming read should skip compaction and clustering instants to avoid
> duplicates
> ---------------------------------------------------------------------------------
>
> Key: HUDI-6317
> URL: https://issues.apache.org/jira/browse/HUDI-6317
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink
> Reporter: Nicholas Jiang
> Assignee: Nicholas Jiang
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At present, the default value of read.streaming.skip_clustering is false,
> which could cause the situation that streaming reading reads the replaced
> file slices of clustering, so that streaming reading may read T-1 day data
> when clustering the data of T-1 day to cause duplicated data. Therefore
> streaming read should skip clustering instants for all cases to avoid reading
> the replaced file slices. Same to `read.streaming.skip_compaction`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)