Yuwei Xiao created HUDI-3194:
--------------------------------
Summary: Fix invisible writes(commits) during compaction
(HoodieParquetRealtimeInputFormat)
Key: HUDI-3194
URL: https://issues.apache.org/jira/browse/HUDI-3194
Project: Apache Hudi
Issue Type: Bug
Reporter: Yuwei Xiao
Suppose a compaction (with instant A) is going on, all writes related with the
compaction (i.e., touch the file groups that are under compaction) will end up
with timestamp A.
For current `HoodieParquetRealtimeInputFormat` implementation, even the writes
complete, the records are invisible until the compaction complete.
The following pseudocode could reproduce the case
```
write 200 records and complete
scheduleCompaction
write 200 records and complete
read the table and only get 200 records
```
Note, the Spark read path is correct and will cover the corner cases during
compaction. But the hive path (also presto) is wrong.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)