Yuwei Xiao created HUDI-3194:
--------------------------------

             Summary: Fix invisible writes(commits) during compaction 
(HoodieParquetRealtimeInputFormat)
                 Key: HUDI-3194
                 URL: https://issues.apache.org/jira/browse/HUDI-3194
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Yuwei Xiao


Suppose a compaction (with instant A) is going on, all writes related with the 
compaction (i.e., touch the file groups that are under compaction) will end up 
with timestamp A.

For current `HoodieParquetRealtimeInputFormat` implementation, even the writes 
complete, the records are invisible until the compaction complete.

The following pseudocode could reproduce the case

```
write 200 records and complete
scheduleCompaction
write 200 records and complete
read the table and only get 200 records
```

Note, the Spark read path is correct and will cover the corner cases during 
compaction. But the hive path (also presto) is wrong.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to