yihua commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1366923827
> Let's ignore MDT for now. I have some basic doubt on MOR table inner workings. > > So, how does extraneous log files are ignored while reading a committed data from DT? ie. let's say we make commit3 which had some spark retries. So, instead of logFile1, we now have logFile1 and logFile2, where only logFile2 is valid. our marker based re-concilliation is not going to delete nor add rollback block for logFile1. So, when someone does snapshot read, where exactly we skip logFile1? It should be part of AbstractLogRecordReader right. I could not locate it only. Based on my understanding, the extraneous logFile1 or particular log block from a successful commit is read for snapshot query if the file or the block is not corrupted, e.g., partially written. Correctness-wise, it should be okay as long as the update logic generates the same merged payload after applying the same change log twice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
