[ https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057648#comment-17057648 ]
Pratyaksh Sharma commented on HUDI-667: --------------------------------------- So this is a corner case where only the key (where key = map.size()) can get overwritten in case of deletes, right? [~pwason] > HoodieTestDataGenerator does not delete keys correctly > ------------------------------------------------------ > > Key: HUDI-667 > URL: https://issues.apache.org/jira/browse/HUDI-667 > Project: Apache Hudi (incubating) > Issue Type: Bug > Reporter: Prashant Wason > Priority: Minor > Labels: pull-request-available > Original Estimate: 1h > Time Spent: 20m > Remaining Estimate: 40m > > HoodieTestDataGenerator is used to generate sample data for unit-tests. It > allows generating HoodieRecords for insert/update/delete. It maintains the > record keys in a HashMap. > private final Map<Integer, KeyPartition> existingKeys; > There are two issues in the implementation: > # Delete from existingKeys uses KeyPartition rather than Integer keys > # Inserting records after deletes is not correctly handled > The implementation uses the Integer key so that values can be looked up > randomly. Assume three values were inserted, then the HashMap will hold: > 0 -> KeyPartition1 > 1 -> KeyPartition2 > 2 -> KeyPartition3 > Now if we delete KeyPartition2 (generate a random record for deletion), the > HashMap will be: > 0 -> KeyPartition1 > 2 -> KeyPartition3 > > Now if we issue a insertBatch() then the insert is > existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the > KeyPartition3 already in the map rather than actually inserting a new entry > in the map. -- This message was sent by Atlassian Jira (v8.3.4#803005)