[ https://issues.apache.org/jira/browse/OOZIE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Kanter updated OOZIE-1118: --------------------------------- Attachment: OOZIE-1118.patch I discussed with Alejandro offline that: - We should delete jobs in increments of size {{limit}} so we don't have a massive delete - The previous patch wasn't handling subworkflows that have subworkflows (that have subworkflows…) - The previous patch wasn't handling parent jobs that had more than 'limit' children; the first {{limit}} children would get deleted and then the parent, leaving any additional children as orphans (and never deleting them) - It needs to make sure that children are deleted before their parent New patch: - Addresses the issues above - Changed the queries (used in {{loadState()}}) for getting workflows and coordinators for purging to only get ones that have no parent instead of filtering them out later in the Java code -- This is important in the case where {{loadState()}} returns only jobs that have parents when there are additional jobs not returned because of {{limit}}; the Java code would filter them all out and we'd never actually delete any jobs. -- Also, as is, if there are more than {{limit}} parent jobs that are eligible to be deleted, they won't all get deleted; this shouldn't be a problem because they'll eventually get deleted when the {{PurgeService}} is run again later (if they are creating jobs faster than the {{PurgeService}} can delete them, then they should increase {{limit}} or how frequently {{PurgeService}} is run. - New/Modified tests for all the changes I'll update ReviewBoard shortly There should be 5 lines longer than 132 characters; but each of these is a query > improve logic of purge service > ------------------------------ > > Key: OOZIE-1118 > URL: https://issues.apache.org/jira/browse/OOZIE-1118 > Project: Oozie > Issue Type: Improvement > Components: bundle, coordinator, workflow > Affects Versions: 3.3.0 > Reporter: Alejandro Abdelnur > Assignee: Robert Kanter > Fix For: trunk > > Attachments: OOZIE-1118.patch, OOZIE-1118.patch, OOZIE-1118.patch, > OOZIE-1118.patch > > > The current logic of the purge service is flat. I.e., WF purging only takes > into account WF end time, it does not take into account that the WF was > started by a COORD job. This means that completed WFs of a running COORD job > could be purge if the COORD job runs for longer that the purge age. > One way of addressing this would be: > * WF purging only purges WF jobs started directly by a client call. > * COORD purging purges COORD jobs started directly by a client call. It also > purges the WF jobs created by the COORD jobs being purged. > * BUNDLE purging purges BUNDLE jobs, and the corresponding COORD jobs and WF > jobs. > This could be handled by a new property in the job beans 'job-owner'. Set to > 'self' it would mean it can be purged by the same job type purger. If set to > other value, then it is a higher level purger the one responsible for purging > it. > This means that for a WF job started by COORD job started by a BUNDLE job, > the WF job and the COORD job would have the BUNDLE job as owner, while the > BUNDLE with have 'self' as owner. > This ownership propagation would also have > A caveat here would be how to handle sub-workflows. > I guess we should check if the wf was created from coord, and if then let > the coord purge take care of that, meaning wf purge does not purge wf started > by coords. > Similarly, the same should also apply for sub-WFs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira