Repository: falcon Updated Branches: refs/heads/master 2d51db7a0 -> fc34d42cb
FALCON-1767 Improve Falcon retention policy documentation Author: Sowmya Ramesh <[email protected]> Reviewers: "Balu Vellanki <[email protected]>" Closes #121 from sowmyaramesh/FALCON-1767 Project: http://git-wip-us.apache.org/repos/asf/falcon/repo Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/fc34d42c Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/fc34d42c Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/fc34d42c Branch: refs/heads/master Commit: fc34d42cbe1a325d686d65fdf7d863d254d7e4d1 Parents: 2d51db7 Author: ["Sowmya Ramesh <[email protected]> Authored: Tue May 3 14:21:31 2016 -0700 Committer: Sowmya Ramesh <[email protected]> Committed: Tue May 3 14:21:31 2016 -0700 ---------------------------------------------------------------------- docs/src/site/twiki/FalconDocumentation.twiki | 6 ++++++ 1 file changed, 6 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/falcon/blob/fc34d42c/docs/src/site/twiki/FalconDocumentation.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki index 122435a..2d67070 100644 --- a/docs/src/site/twiki/FalconDocumentation.twiki +++ b/docs/src/site/twiki/FalconDocumentation.twiki @@ -266,6 +266,12 @@ to false in runtime.properties. With the integration of Hive, Falcon also provides retention for tables in Hive catalog. +When a feed is scheduled Falcon kicks off the retention policy immediately. When job runs, it deletes everything that's eligible for eviction - eligibility criteria is the date pattern on the partition and NOT creation date. +For e.g. if the retention limit is 90 days then retention job consistently deletes files older than 90 days. + +For retention, Falcon expects data to be in dated partitions. When the retention job is kicked off, it discovers data that needs to be evicted based on retention policy. It gets the location from the feed and uses pattern matching +to find the pattern to get the list of data for the feed, then gets the date from the data path. If the data path date is beyond the retention limit it's deleted. As this uses pattern matching it is not time consuming and hence doesn't introduce performance overhead. + ---+++ Example: If retention period is 10 hours, and the policy kicks in at time 't', the data retained by system is essentially the one after or equal to t-10h . Any data before t-10h is removed from the system.
