Repository: falcon
Updated Branches:
  refs/heads/master 2d51db7a0 -> fc34d42cb


FALCON-1767 Improve Falcon retention policy documentation

Author: Sowmya Ramesh <[email protected]>

Reviewers: "Balu Vellanki <[email protected]>"

Closes #121 from sowmyaramesh/FALCON-1767


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/fc34d42c
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/fc34d42c
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/fc34d42c

Branch: refs/heads/master
Commit: fc34d42cbe1a325d686d65fdf7d863d254d7e4d1
Parents: 2d51db7
Author: ["Sowmya Ramesh <[email protected]>
Authored: Tue May 3 14:21:31 2016 -0700
Committer: Sowmya Ramesh <[email protected]>
Committed: Tue May 3 14:21:31 2016 -0700

----------------------------------------------------------------------
 docs/src/site/twiki/FalconDocumentation.twiki | 6 ++++++
 1 file changed, 6 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/fc34d42c/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki 
b/docs/src/site/twiki/FalconDocumentation.twiki
index 122435a..2d67070 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -266,6 +266,12 @@ to false in runtime.properties.
 
 With the integration of Hive, Falcon also provides retention for tables in 
Hive catalog.
 
+When a feed is scheduled Falcon kicks off the retention policy immediately. 
When job runs, it deletes everything that's eligible for eviction - eligibility 
criteria is the date pattern on the partition and NOT creation date.
+For e.g. if the retention limit is 90 days then retention job consistently 
deletes files older than 90 days.
+
+For retention, Falcon expects data to be in dated partitions. When the 
retention job is kicked off, it discovers data that needs to be evicted based 
on retention policy. It gets the location from the feed and uses pattern 
matching
+to find the pattern to get the list of data for the feed, then gets the date 
from the data path. If the data path date is beyond the retention limit it's 
deleted. As this uses pattern matching it is not time consuming and hence 
doesn't introduce performance overhead.
+
 ---+++ Example:
 If retention period is 10 hours, and the policy kicks in at time 't', the data 
retained by system is essentially the
 one after or equal to t-10h . Any data before t-10h is removed from the system.

Reply via email to