smdahmed commented on issue #859: Hudi upsert after a delete in partition will 
cause valid records inserted to disappear.
URL: https://github.com/apache/incubator-hudi/issues/859#issuecomment-527134270
 
 
   Thanks again Vinoth. I think I know the reason why you are not seeing the 
issue. If you could kindly include partition in your setup, you should see the 
issue too. As I suggested in the initial report, this issue is only reproduced 
if there is a partitioned hive table involved. 
   
   Steps: (Let the schema be: id, name, team)
   
   1. Insert data into certain partition (eg: p1) -> (1, kabeer, hudi | 2, 
vinoth, hudi)
   2. Delete record (1, kabeer, hudi)
   3. Upsert a new record: (3, balaji, hudi)
   
   Please treat the partition column as team column and kindly ensure that the 
hive table partition path in which all the 3 records should be 
<base_path_of_table>/team. 
   
   Now when you query the table. You should expect to see the records vinoth 
and balaji. But you would only see balaji. 
   -----------------------------------------------
   Some additional information based on your setup:
   I have repeated the above test setup by making partition column as id column 
- i.e. all the records land in their own partition columns based on their ID - 
so kabeer lands in partition 1, vinoth in 2 and balaji in partition 3. 
   Since the record that gets deleted is kabeer, I am successfully able to see 
Vinoth and Balaji. So I am now convinced that for a table in a given partition, 
an upsert followed by delete will cause all previous records to vanish away. 
   
   More information:
   The upsert routine that I use after insert and delete is the same. I can 
confirm that upserts after insert land me with expected behaviour which goes on 
to indicate that I am not doing savemode.overwrite It is only after delete that 
there is an issue. 
   
   Hopefully when you extend your setup with partition information of the 
table, we can confirm that the issue exists. I am very grateful to you for all  
your support. Eagerly looking forward to see your further findings. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to