hudi-bot opened a new issue, #14770: URL: https://github.com/apache/hudi/issues/14770
For e:g : Have records only updated last month GH: https://github.com/apache/hudi/issues/2743 ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-1741 - Type: New Feature --- ## Comments 31/Mar/21 00:43;vbalaji;[~shivnarayan] : FYI;;; --- 03/Apr/21 16:28;pratyakshsharma;Guess the same can be handled with this Jira - https://issues.apache.org/jira/browse/HUDI-349? [~vbalaji] [~shivnarayan];;; --- 05/Apr/21 15:25;aditiwari;[~pratyakshsharma] I guess with time based cleaning policy, we might need some modifications in compactor as well. For a recently updated base file also some of its records might be older. Time based cleaner and filtering out records with older commit time while compacting(in MOR) or rewriting(in COW) base file should solve the issue.;;; --- 28/Oct/22 03:09;nicholasjiang;[~shivnarayan], IMO, each record of hudi has the commit time of hudi. The solution is to first follow the TTL, do not display expired data when checking, or even push down to the data source directly, and then delete it when doing operations such as clustering that need to rewrite the data. WDYT? cc [~xleesf] ;;; --- 28/Oct/22 03:29;xleesf;[~nicholasjiang] agree with the solution;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
