yihua commented on PR #7517:
URL: https://github.com/apache/hudi/pull/7517#issuecomment-1366923827

   > Let's ignore MDT for now. I have some basic doubt on MOR table inner 
workings.
   > 
   > So, how does extraneous log files are ignored while reading a committed 
data from DT? ie. let's say we make commit3 which had some spark retries. So, 
instead of logFile1, we now have logFile1 and logFile2, where only logFile2 is 
valid. our marker based re-concilliation is not going to delete nor add 
rollback block for logFile1. So, when someone does snapshot read, where exactly 
we skip logFile1? It should be part of AbstractLogRecordReader right. I could 
not locate it only.
   
   Based on my understanding, the extraneous logFile1 or particular log block 
from a successful commit is read for snapshot query if the file or the block is 
not corrupted, e.g., partially written.  Correctness-wise, it should be okay as 
long as the update logic generates the same merged payload after applying the 
same change log twice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to