alexeykudinkin commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1026246434
>Here is my understanding of the fix. We fix the rollback plan generated in ListingBasedStrategy to just include the log files with full sizes. For Marker bases strategy, we set file sizes to -1, but fixed the right set of log files to be included in rollback plan. > So, we still have an issue w/ how we reconcile or merge multiple metadata records. For eg: Rec1: file1 delta size 100 (commit1) Rec2: file1 deltasize 200 (commit2) Rec3: file1 full size 350 (rollback) > when we merge all these 3 records from metadata table, whats the final resolved record look like ? When we do a rollback the value in the plan (carrying the mapping of path to size) is only used to determine whether we should _append Rollback Block_ or _delete files_. After we actually appended the Rollback Block, we now only modify the record related to the file we've appended the block to (previously we would also update all the log-files from the `writtenLogFileSizesMap`) > If you don't mind, can we add tests for the fix in this patch. The tests should fail if not the fix and should pass w/ the fix. There are already tests covering this, which were failing in #4556 (which is how i come to fix this issues). The reason why these were not failing is simply b/c we're not using MT table when we're reading the data back using Hive's InputFormats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
