[ 
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057648#comment-17057648
 ] 

Pratyaksh Sharma commented on HUDI-667:
---------------------------------------

So this is a corner case where only the key (where key = map.size()) can get 
overwritten in case of deletes, right? [~pwason]

> HoodieTestDataGenerator does not delete keys correctly
> ------------------------------------------------------
>
>                 Key: HUDI-667
>                 URL: https://issues.apache.org/jira/browse/HUDI-667
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 1h
>          Time Spent: 20m
>  Remaining Estimate: 40m
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
> allows generating HoodieRecords for insert/update/delete. It maintains the 
> record keys in a HashMap.
> private final Map<Integer, KeyPartition> existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up 
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the 
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>  
> Now if we issue a insertBatch() then the insert is 
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
> KeyPartition3 already in the map rather than actually inserting a new entry 
> in the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to