[
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pratyaksh Sharma updated HUDI-667:
----------------------------------
Comment: was deleted
(was: Ok, let me try to fix this in my current PR.
[https://github.com/apache/incubator-hudi/pull/1150])
> HoodieTestDataGenerator does not delete keys correctly
> ------------------------------------------------------
>
> Key: HUDI-667
> URL: https://issues.apache.org/jira/browse/HUDI-667
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Reporter: Prashant Wason
> Priority: Minor
> Labels: pull-request-available
> Original Estimate: 1h
> Time Spent: 20m
> Remaining Estimate: 40m
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It
> allows generating HoodieRecords for insert/update/delete. It maintains the
> record keys in a HashMap.
> private final Map<Integer, KeyPartition> existingKeys;
> There are two issues in the implementation:
> # Delete from existingKeys uses KeyPartition rather than Integer keys
> # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2 (generate a random record for deletion), the
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>
> Now if we issue a insertBatch() then the insert is
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the
> KeyPartition3 already in the map rather than actually inserting a new entry
> in the map.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)