ssdong commented on issue #2818: URL: https://github.com/apache/hudi/issues/2818#issuecomment-822660600
Hey @garyli1019 thank you for the meticulous explanation. Yep, I was trying to confirm the “expected” behavior of incremental query. It makes sense to pull from _existing_ active timeline, given a bulky active time line would introduce a file listing issue and so we do archiving. Controlling the number of instants on the active timeline through `keep.max` is definitely one way to go. Adding extra configuration(default to be `false`) so we could tune it to pull from archived timeline. It adds extra freedom for user to control the behavior and doesn’t sound bad to me. However, kindly allow me to confirm one thing. After we confirm the data files to open up and find those records that falls within the `beginInstantTime` and a potential `endInstantTime`, aren’t we comparing the `_hoodie_commit_time`, which stored together with the record, with the user passed-in begin and end timestamp? In that case, we should be automatically comparing the records with a potential timestamp that has been be archived. How come it doesn’t return the result if the record was being modified _after_ the given timestamp?(if you look at my testing experiment). I understand i may have made it sound way more confusing than it could be, let me know if there is anything I am missing. I just wish I could truly understand it more. Respect. 😅 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
