ssdong commented on issue #2818:
URL: https://github.com/apache/hudi/issues/2818#issuecomment-822660600


   Hey @garyli1019 thank you for the meticulous explanation. Yep, I was trying 
to confirm the “expected” behavior of incremental query. It makes sense to pull 
from _existing_ active timeline, given a bulky active time line would introduce 
a file listing issue and so we do archiving. Controlling the number of instants 
on the active timeline through `keep.max` is definitely one way to go. Adding 
extra configuration(default to be `false`) so we could tune it to pull from 
archived timeline. It adds extra freedom for user to control the behavior and 
doesn’t sound bad to me. However, kindly allow me to confirm one thing. 
   
   After we confirm the data files to open up and find those records that falls 
within the `beginInstantTime` and a potential `endInstantTime`, aren’t we 
comparing the `_hoodie_commit_time`, which stored together with the record, 
with the user passed-in begin and end timestamp? In that case, we should be 
automatically comparing the records with a potential timestamp that has been be 
archived. How come it doesn’t return the result if the record was being 
modified _after_ the given timestamp?(if you look at my testing experiment). I 
understand i may have made it sound way more confusing than it could be, let me 
know if there is anything I am missing. I just wish I could truly understand it 
more. Respect. 😅 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to