n3nash commented on issue #2951:
URL: https://github.com/apache/hudi/issues/2951#issuecomment-851720834


   @ChandraNarreddy Sorry for the delayed response. Hudi only keeps track of 
what records have changed. This is known by the _hoodie_commit_time value 
between 2 records. It is not possible to identify whether the changes for the 
records were inserts/updates/deletes just by looking at this timestamp. 
   
   Hudi can only keep versions of data if the config is set to keep those many 
versions of the data. Suppose a record changes 10 times, this means 10 versions 
of that record were created. Now, only if the cleaner config ensures that 10 
versions of the data are kept can Hudi allow for users to look at the 
historical values of the data. Hudi's incremental pull feature is not designed 
to provide a change log of all these 10 values, it is meant to provide the 
latest state of record since the last time you incrementally pulled. 
   
   Yes, hoodie.cleaner.fileversions.retained and 
hoodie.cleaner.commits.retained are independent of each other are are different 
cleaning policies. This blog from @pratyakshsharma is very helpful to 
understand this. https://github.com/apache/hudi/pull/2967
   
   Please feel free to re-open if you have any other questions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to