ssdong commented on issue #2818:
URL: https://github.com/apache/hudi/issues/2818#issuecomment-821810190


   @garyli1019 Thank you for getting back to me. I've created a 
[JIRA](https://issues.apache.org/jira/browse/HUDI-1807) for the 
`NoSuchElementException` issue and will work on it. 
   As for the incremental pulling concern, as the document says:
   ```
   Property: hoodie.datasource.read.begin.instanttime, [Required in incremental 
mode] 
   Instant time to start incrementally pulling data from. 
   The instanttime here need not necessarily correspond to an instant on the 
timeline. 
   New data written with an instant_time > BEGIN_INSTANTTIME are fetched out. 
   For e.g: ‘20170901080000’ will get all new data written after Sep 1, 2017 
08:00AM.
   ```
   
   I believe it clearly states `The instanttime here need not necessarily 
correspond to an instant on the timeline`. It contradicts the behaviour I had 
observed in my experiment where I fed it an instant time in the past, but it 
only fetched partial updates for me; updates to `199` was _missing_.
   
   Something wasn't _correct_, either we had a discrepancy between our actual 
implementation and the way we present it in the document or the other way 
around that our existing archiving and timeline mechanism affects the integrity 
of our incremental query to some extend. I need a thorough understanding to 
make a judgemental call regarding whether we should fetch updates happening 
among the archived timeline since clearly it introduces overhead and may affect 
other things. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to