[jira] [Commented] (FALCON-1644) Retention : Some feed instances are never deleted by retention jobs.

Balu Vellanki (JIRA) Fri, 04 Dec 2015 10:00:17 -0800

    [ 
https://issues.apache.org/jira/browse/FALCON-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041866#comment-15041866
 ]


Balu Vellanki commented on FALCON-1644:
---------------------------------------

To further explain the scenarios, a typical retention policy is used by users 
to retain data. Users might want to retain instances based on 

# Age limit :  Any instance younger than N time units should be retained. This 
is supported by Falcon today.  For example, if a feed is valid on cluster from 
2015-01-01T00:00Z to 2015-01-10T00:00Z. Retention age limit is 1 day. Feed 
frequency is hourly
#* The instance 2015-01-01T01:00Z will be retained up to 2015-01-02T01:00Z. 
This instance will be deleted by retention job running after 2015-01-02T01:00Z
#* The instance 2015-01-09T023:00Z should be retained up to 2015-01-10T023:00Z. 
The user expects this instance to be deleted by retention job running after 
2015-01-10T23:00Z. Today, Falcon does not run any retention jobs beyond 
2015-01-10T00:00Z.  So the instance 2015-01-09T023:00Z will stay forever. This 
is wrong behavior and should be fixed. 
# Instance Count Limit :  Always retain last N instances.  This is not 
supported by Falcon today. If a user wants to retain last N instances of a feed 
(no matter the age),  Falcon should add this feature to retention lifecycle. 

Hope this clears up the confusion.

> Retention : Some feed instances are never deleted by retention jobs.
> --------------------------------------------------------------------
>
>                 Key: FALCON-1644
>                 URL: https://issues.apache.org/jira/browse/FALCON-1644
>             Project: Falcon
>          Issue Type: Bug
>          Components: retention
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1644.patch
>
>
> Here is a sample feed xml.
> {code}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <feed name="rawEmailFeed" description="Raw customer email feed" 
> xmlns="uri:falcon:feed:0.1">
>     <tags>externalSystem=USWestEmailServers</tags>
>     <groups>churnAnalysisDataPipeline</groups>
>     <frequency>hours(1)</frequency>
>     <timezone>UTC</timezone>
>     <late-arrival cut-off="hours(1)"/>
>     <clusters>
>         <cluster name="primaryCluster" type="source">
>             <validity start="2015-10-30T01:00Z" end="2015-10-30T10:00Z"/>
>             <retention limit="hours(10)" action="delete"/>
>         </cluster>
>     </clusters>
>     <locations>
>         <location type="data" 
> path="/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}"/>
>         <location type="stats" path="/"/>
>         <location type="meta" path="/"/>
>     </locations>
>     <ACL owner="ambari-qa" group="users" permission="0x755"/>
>     <schema location="/none" provider="/none"/>
> </feed>
> {code}
> In the above example, the validity time is "the time interval when the feed 
> is valid on this cluster". After the validity time ends, falcon is not 
> expected to perform any operations on the feed. The retention job for this 
> feed will be run from validity start time up to validity end time, and will 
> delete any feed instances older than 10 hours. Some instances of Feed will 
> never be deleted. In the above example, feed instances at between 
> 2015-10-30T00:00Z and 2015-10-30T10:00Z will never be deleted.
> Ideally, the retention coordinator job should run from "validity start time" 
> up to "validity end time + retention age limit" to ensure all instances are 
> handled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FALCON-1644) Retention : Some feed instances are never deleted by retention jobs.

Reply via email to