alexeykudinkin commented on pull request #4716:
URL: https://github.com/apache/hudi/pull/4716#issuecomment-1026246434


   >Here is my understanding of the fix.
   We fix the rollback plan generated in ListingBasedStrategy to just include 
the log files with full sizes.
   For Marker bases strategy, we set file sizes to -1, but fixed the right set 
of log files to be included in rollback plan.
   
   > So, we still have an issue w/ how we reconcile or merge multiple metadata 
records.
   For eg:
   Rec1: file1 delta size 100 (commit1)
   Rec2: file1 deltasize 200 (commit2)
   Rec3: file1 full size 350 (rollback)
   
   > when we merge all these 3 records from metadata table, whats the final 
resolved record look like ? 
   
   When we do a rollback the value in the plan (carrying the mapping of path to 
size)  is only used to determine whether we should _append Rollback Block_ or 
_delete files_. After we actually appended the Rollback Block, we now only 
modify the record related to the file we've appended the block to (previously 
we would also update all the log-files from the `writtenLogFileSizesMap`)
   
   > If you don't mind, can we add tests for the fix in this patch. The tests 
should fail if not the fix and should pass w/ the fix.
   
   There are already tests covering this, which were failing in #4556 (which is 
how i come to fix this issues). 
   The reason why these were not failing is simply b/c we're not using MT table 
when we're reading the data back using Hive's InputFormats. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to