Janos Makai created OOZIE-3669:
----------------------------------

             Summary: Fix purge process for bundles to prevent orphan 
coordinators
                 Key: OOZIE-3669
                 URL: https://issues.apache.org/jira/browse/OOZIE-3669
             Project: Oozie
          Issue Type: Bug
          Components: core
    Affects Versions: 5.2.1
            Reporter: Janos Makai
            Assignee: Janos Makai


The Oozie purge process for bundles is creating orphan coordinators. When 
purging bundle jobs and bundle actions, it does not always purge coordinator 
jobs, etc. This causes orphaned coordinators, meaning neither they nor their 
children will ever be purged due to the purge logic.

 
----
 

When purging bundles, it first compiles a list of any coordinators which are 
not ready to purge [0]. It checks the coord list for status and coordOlderThan. 
If the no child coordinator meets these criteria, it adds it to the 
coordsToPurge list.

Being added to the list does not guarantee that the coordinator will be purged 
however. The processCoordinators method also has logic to check if the children 
workflows are older than wfOlderThan [1]. If a purge command is started where 
wfOlderThan is much higher than coordOlderThan (for example the default values 
are 30 days for workflows and 7 days for coordinators), then the bundle will be 
purged, but the coordinator will not.

Since the bundle is now purged, the child coordinator will never be purged 
because only parentless coordinators will be checked, since coordinators with 
parents will only be purged when the bundle is purged

[0]
{code:java}
PurgeXCommand
 380 long numChildrenNotReady = jpaService.execute(
 381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan, 
bundleId));
CoordinatorJobBean
 192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE", 
query = "select count(w) from CoordinatorJobBean"
 193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED', 
'FAILED', 'KILLED', 'DONEWITHERROR') "
 194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
{code}
 

[1]
{code:java}
PurgeXCommand
 343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
 344
private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long 
wfOlderThanMS) {
 308 final Date wfEndTime = wfjBean.getEndTime();
 309 final boolean isFinished = wfjBean.inTerminalState();
 310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < wfOlderThanMS)
{ 311 return true; 312 }
313 else {
 314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
 315 if (isFinished && lastModificationTime != null && 
lastModificationTime.getTime() < wfOlderThanMS)
{ 316 return true; 317 }
318 }
 319 return false;
345 // if all workflow are ready to purge add them and add the coordinator and 
their actions
 346 if(workflowChildren.size() == wfjBeanList.size()) {
 347 LOG.debug("Purging coordinator " + coordId);
 348 wfsToPurge.addAll(workflowChildren);
 349 coordsToPurge.add(coordId);
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to