sivabalan narayanan created HUDI-1763:
-----------------------------------------

             Summary: DefaultHoodieRecordPayload does not honor ordering value 
when records within multiple log files are merged
                 Key: HUDI-1763
                 URL: https://issues.apache.org/jira/browse/HUDI-1763
             Project: Apache Hudi
          Issue Type: Bug
          Components: Writer Core
    Affects Versions: 0.8.0
            Reporter: sivabalan narayanan


While creating HoodieRecordPayloads from log files in case of MOR tables, the 
payloads are created without any orderingVal (even if specified while writing 
data). Due to this the precombine function could result in any payload 
irrespective of its orderingVal.

Attaching a sample script to reproduce the issue.

In this example, for key "key1", 1st insert is with ts=1000. Then we update 
with ts=2000. Thenn we updated with ts=500. Ideally after last update if we 
snnapshot query the table, we must get key1 with ts=2000 (since our ordering 
field is ts). However it shows entry of ts=1000 because from logs it ignores 
ts=2000 and only picks up ts=500.

Also AFAIU, the same flow will be used while compaction and then we might lose 
data forever.

 

More info: https://github.com/apache/hudi/issues/2756



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to