n3nash commented on issue #2951: URL: https://github.com/apache/hudi/issues/2951#issuecomment-851720834
@ChandraNarreddy Sorry for the delayed response. Hudi only keeps track of what records have changed. This is known by the _hoodie_commit_time value between 2 records. It is not possible to identify whether the changes for the records were inserts/updates/deletes just by looking at this timestamp. Hudi can only keep versions of data if the config is set to keep those many versions of the data. Suppose a record changes 10 times, this means 10 versions of that record were created. Now, only if the cleaner config ensures that 10 versions of the data are kept can Hudi allow for users to look at the historical values of the data. Hudi's incremental pull feature is not designed to provide a change log of all these 10 values, it is meant to provide the latest state of record since the last time you incrementally pulled. Yes, hoodie.cleaner.fileversions.retained and hoodie.cleaner.commits.retained are independent of each other are are different cleaning policies. This blog from @pratyakshsharma is very helpful to understand this. https://github.com/apache/hudi/pull/2967 Please feel free to re-open if you have any other questions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org