[
https://issues.apache.org/jira/browse/HUDI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3213:
-----------------------------
Status: In Progress (was: Open)
> compaction should not change the commit time
> --------------------------------------------
>
> Key: HUDI-3213
> URL: https://issues.apache.org/jira/browse/HUDI-3213
> Project: Apache Hudi
> Issue Type: Bug
> Components: spark, writer-core
> Reporter: Yann Byron
> Assignee: Yann Byron
> Priority: Critical
> Labels: hudi-on-call, pull-request-available, sev:critical
> Fix For: 0.10.1
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> when finish the sixth operation where two records inserted and `compaction`
> in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because
> there are 150 records which just have finished the `compaction` and consist
> of 100 records updated in the second and third times and 50 records updated
> in the fifth updated, and 2 records inserted in the six time.
> The right answer should be 2, and 150 records should not be counted in.
> The reason is that `compaction` has changed the commit time of some records
> which are updated later and stored in log file.
> {code:java}
> val hudiIncDF6 = spark.read.format("org.apache.hudi")
> .option(DataSourceReadOptions.QUERY_TYPE.key,
> DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
> .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time)
> .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time)
> .load(basePath)
> // compaction updated 150 rows + inserted 2 new row
> assertEquals(152, hudiIncDF6.count()) {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)