[ 
https://issues.apache.org/jira/browse/OOZIE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1118:
---------------------------------

    Attachment: OOZIE-1118.patch

I discussed with Alejandro offline that:
- We should delete jobs in increments of size {{limit}} so we don't have a 
massive delete
- The previous patch wasn't handling subworkflows that have subworkflows (that 
have subworkflows…)
- The previous patch wasn't handling parent jobs that had more than 'limit' 
children; the first {{limit}} children would get deleted and then the parent, 
leaving any additional children as orphans (and never deleting them)
- It needs to make sure that children are deleted before their parent

New patch:
- Addresses the issues above
- Changed the queries (used in {{loadState()}}) for getting workflows and 
coordinators for purging to only get ones that have no parent instead of 
filtering them out later in the Java code
-- This is important in the case where {{loadState()}} returns only jobs that 
have parents when there are additional jobs not returned because of {{limit}}; 
the Java code would filter them all out and we'd never actually delete any 
jobs.  
-- Also, as is, if there are more than {{limit}} parent jobs that are eligible 
to be deleted, they won't all get deleted; this shouldn't be a problem because 
they'll eventually get deleted when the {{PurgeService}} is run again later (if 
they are creating jobs faster than the {{PurgeService}} can delete them, then 
they should increase {{limit}} or how frequently {{PurgeService}} is run.  
- New/Modified tests for all the changes

I'll update ReviewBoard shortly

There should be 5 lines longer than 132 characters; but each of these is a query
                
> improve logic of purge service
> ------------------------------
>
>                 Key: OOZIE-1118
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1118
>             Project: Oozie
>          Issue Type: Improvement
>          Components: bundle, coordinator, workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Robert Kanter
>             Fix For: trunk
>
>         Attachments: OOZIE-1118.patch, OOZIE-1118.patch, OOZIE-1118.patch, 
> OOZIE-1118.patch
>
>
> The current logic of the purge service is flat. I.e., WF purging only takes 
> into account WF end time, it does not take into account that the WF was 
> started by a COORD job. This means that completed WFs of a running COORD job 
> could be purge if the COORD job runs for longer that the purge age.
> One way of addressing this would be:
> * WF purging only purges WF jobs started directly by a client call.
> * COORD purging purges COORD jobs started directly by a client call. It also 
> purges the WF jobs created by the COORD jobs being purged.
> * BUNDLE purging purges BUNDLE jobs, and the corresponding COORD jobs and WF 
> jobs.
> This could be handled by a new property in the job beans 'job-owner'. Set to 
> 'self' it would mean it can be purged by the same job type purger. If set to 
> other value, then it is a higher level purger the one responsible for purging 
> it.
> This means that for a WF job started by COORD job started by a BUNDLE job, 
> the WF job and the COORD job would have the BUNDLE job as owner, while the 
> BUNDLE with have 'self' as owner.
> This ownership propagation would also have
> A caveat here would be how to handle sub-workflows. 
> I guess we should check if the wf was created from  coord, and if then let 
> the coord purge take care of that, meaning wf purge does not purge wf started 
> by coords.
> Similarly, the same should also apply for sub-WFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to