[ 
https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328511#comment-16328511
 ] 

Attila Sasvari commented on OOZIE-1401:
---------------------------------------

committed to master

> PurgeCommand should purge the workflow jobs w/o end_time
> --------------------------------------------------------
>
>                 Key: OOZIE-1401
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1401
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: bundle, coordinator, workflow
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Attila Sasvari
>            Priority: Major
>             Fix For: 5.0.0b1
>
>         Attachments: OOZIE-1401-001.patch, OOZIE-1401.amend.003.patch, 
> amend-OOZIE-1401-001.patch, amend-OOZIE-1401-002.patch
>
>
> Currently, {{PurgeXCommand}} logic is not working with those workflow jobs 
> with {{end_time=null}}. This command needs to take care of those jobs as 
> well. This happens in the case of long stuck jobs after Hadoop restarts or DB 
> failures. It could be done by checking {{last_modified_time}} instead, if 
> {{end_time}} is not available.
> The current query:
> {code:sql}
> select w from WorkflowJobBean w where w.endTimestamp < :endTime
> {code}
> There is also an issue when:
> * there is a parent workflow that has its {{end_time}} set
> * is otherwise eligible for {{PurgeXCommand}}: {{end_time}} is older than 
> configured number of days, and has {{status}} either {{KILLED}}, or 
> {{FAILED}}, or {{SUCCEEDED}}
> * has a child workflow that has the {{parent_id}} set to the {{id}} of the 
> parent workflow
> * child workflow has its {{end_time = NULL}}
> In this case, 
> [*{{PurgeXCommand#fetchTerminatedWorkflow()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/PurgeXCommand.java#L249]
>  throws a {{NullPointerException}} like this:
> {noformat}
> 2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
> 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Execute command [purge] key [null]
> 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, 
> Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
> 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Exception, 
> java.lang.NullPointerException
>       at 
> org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
>       at 
> org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
>       at 
> org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
>       at 
> org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
>       at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to