[ 
https://issues.apache.org/jira/browse/FALCON-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045992#comment-15045992
 ] 

Balu Vellanki commented on FALCON-1644:
---------------------------------------

[~venkatnrangan] has another solution. Add a new runtime property 
"falcon.retention.keep.instances.beyond.validity" (I welcome suggestions for 
better name).  By default this value is set to false and falcon will set the 
endtime of coordinator to "feed cluster validity end time + retention time 
limit". 
If the user would like to not delete the recent instances of feeds, they should 
set this value to true. In this case falcon will set the endtime of coordinator 
to "feed cluster validity end time". Is this approach acceptable to all?

> Retention : Some feed instances are never deleted by retention jobs.
> --------------------------------------------------------------------
>
>                 Key: FALCON-1644
>                 URL: https://issues.apache.org/jira/browse/FALCON-1644
>             Project: Falcon
>          Issue Type: Bug
>          Components: retention
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1644.patch
>
>
> ​Here is a sample feed xml.
> {code}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <feed name="rawEmailFeed" description="Raw customer email feed" 
> xmlns="uri:falcon:feed:0.1">
>     <tags>externalSystem=USWestEmailServers</tags>
>     <groups>churnAnalysisDataPipeline</groups>
>     <frequency>hours(1)</frequency>
>     <timezone>UTC</timezone>
>     <late-arrival cut-off="hours(1)"/>
>     <clusters>
>         <cluster name="primaryCluster" type="source">
>             <validity start="2015-10-30T01:00Z" end="2015-10-30T10:00Z"/>
>             <retention limit="hours(10)" action="delete"/>
>         </cluster>
>     </clusters>
>     <locations>
>         <location type="data" 
> path="/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}"/>
>         <location type="stats" path="/"/>
>         <location type="meta" path="/"/>
>     </locations>
>     <ACL owner="ambari-qa" group="users" permission="0x755"/>
>     <schema location="/none" provider="/none"/>
> </feed>
> {code}
> In the above example, the validity time is "the time interval when the feed 
> is valid on this cluster". After the validity time ends, falcon is not 
> expected to perform any operations on the feed. The retention job for this 
> feed will be run from validity start time up to validity end time, and will 
> delete any feed instances older than 10 hours. Some instances of Feed will 
> never be deleted. In the above example, feed instances at between 
> 2015-10-30T00:00Z and 2015-10-30T10:00Z will never be deleted.
> Ideally, the retention coordinator job should run from "validity start time" 
> up to "validity end time + retention age limit" to ensure all instances are 
> handled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to