[ 
https://issues.apache.org/jira/browse/FALCON-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044439#comment-15044439
 ] 

Peeyush Bishnoi edited comment on FALCON-1644 at 12/7/15 6:26 AM:
------------------------------------------------------------------

[~bvellanki] As [~sandeep.samudrala] mentioned there might be the users who are 
using the retention as per current implementation, so this patch will effect 
their already retained data. I am thinking two ways to solve this:

1.  Provide an option in current retention/lifecycle whether to delete last N 
duration instances data or not. If user set option, to delete last N duration 
retained data, then set the endtime of coordinator to "feed cluster validity 
end time + retention time limit" else  don't set. So with this approach, it 
will be on the users interest to delete last N entire data or not.

2. If you think that users must have instance retention count limit to retain 
last n instances data (instead of option 1), than instance retention count 
limit should go first and then the fix in this issue should go. With this, user 
who wants to retain last N duration data (after validity expires),  will not 
complain and will suggest them to migrate to use instance retention count 
limit. 

Thoughts please.




was (Author: peeyushb):
[~bvellanki] As [~sandeep.samudrala] mentioned there might be the users who are 
using the retention as per current implementation, so this patch will effect 
their already retained data. I am thinking two ways to solve this:
1.  Provide an option in current retention/lifecycle whether to delete last N 
duration instances data or not. If user set option, to delete last N duration 
retained data, then set the endtime of coordinator to "feed cluster validity 
end time + retention time limit" else  don't set. So with this approach, it 
will be on the users interest to delete last N entire data or not.
2. If you think that users must have instance retention count limit to retain 
last n instances data (instead of option 1), than instance retention count 
limit should go first and then the fix in this issue should go. With this, user 
who wants to retain data (after validity expires)  will not complain and will 
suggest them to migrate to use instance retention count limit. 

Thoughts please.



> Retention : Some feed instances are never deleted by retention jobs.
> --------------------------------------------------------------------
>
>                 Key: FALCON-1644
>                 URL: https://issues.apache.org/jira/browse/FALCON-1644
>             Project: Falcon
>          Issue Type: Bug
>          Components: retention
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1644.patch
>
>
> ​Here is a sample feed xml.
> {code}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <feed name="rawEmailFeed" description="Raw customer email feed" 
> xmlns="uri:falcon:feed:0.1">
>     <tags>externalSystem=USWestEmailServers</tags>
>     <groups>churnAnalysisDataPipeline</groups>
>     <frequency>hours(1)</frequency>
>     <timezone>UTC</timezone>
>     <late-arrival cut-off="hours(1)"/>
>     <clusters>
>         <cluster name="primaryCluster" type="source">
>             <validity start="2015-10-30T01:00Z" end="2015-10-30T10:00Z"/>
>             <retention limit="hours(10)" action="delete"/>
>         </cluster>
>     </clusters>
>     <locations>
>         <location type="data" 
> path="/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}"/>
>         <location type="stats" path="/"/>
>         <location type="meta" path="/"/>
>     </locations>
>     <ACL owner="ambari-qa" group="users" permission="0x755"/>
>     <schema location="/none" provider="/none"/>
> </feed>
> {code}
> In the above example, the validity time is "the time interval when the feed 
> is valid on this cluster". After the validity time ends, falcon is not 
> expected to perform any operations on the feed. The retention job for this 
> feed will be run from validity start time up to validity end time, and will 
> delete any feed instances older than 10 hours. Some instances of Feed will 
> never be deleted. In the above example, feed instances at between 
> 2015-10-30T00:00Z and 2015-10-30T10:00Z will never be deleted.
> Ideally, the retention coordinator job should run from "validity start time" 
> up to "validity end time + retention age limit" to ensure all instances are 
> handled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to