[ https://issues.apache.org/jira/browse/OOZIE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andras Piros updated OOZIE-3260: -------------------------------- Summary: Remove item after first unsuccessful attempt from in-memory SLA map (was: Remove item after first try from in-memory SLA map) > Remove item after first unsuccessful attempt from in-memory SLA map > ------------------------------------------------------------------- > > Key: OOZIE-3260 > URL: https://issues.apache.org/jira/browse/OOZIE-3260 > Project: Oozie > Issue Type: Bug > Components: coordinator, core, workflow > Affects Versions: 5.0.0 > Reporter: Andras Piros > Assignee: Andras Piros > Priority: Major > > Despite having implemented OOZIE-3134, there are still cases where > {{SLACalculatorMemory#slaMap}} and database contents still get out of sync. > E.g. database contents of {{SLA_SUMMARY}} table have been purged manually > from DB, or no corresponding {{WF_JOBS}} or {{COORD_JOBS}} entries exist > anymore in DB. > In those rare cases, we see {{JPAExecutorException}} instances like: > {noformat} > 2017-10-09 17:00:18,185 DEBUG openjpa.jdbc.SQL: SERVER[HOST] <t 1527981517, > conn 1584126245> [0 ms] spent > 2017-10-09 17:00:18,185 ERROR org.apache.oozie.sla.SLACalculatorMemory: > SERVER[tplhc01c001.iuser.iroot.adidom.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[0000438-170916014916144-oozie-oozi-C@556] ACTION[-] Exception in SLA > processing for job [0000438-170916014916144-oozie-oozi-C@556] > org.apache.oozie.executor.jpa.JPAExecutorException: E0604: Job does not exist > [select w.eventProcessed from SLASummaryBean w where w.jobId = :id] > at > org.apache.oozie.executor.jpa.SLASummaryQueryExecutor.getSingleValue(SLASummaryQueryExecutor.java:161) > at > org.apache.oozie.sla.SLACalculatorMemory.updateJobSla(SLACalculatorMemory.java:480) > at > org.apache.oozie.sla.SLACalculatorMemory.updateAllSlaStatus(SLACalculatorMemory.java:601) > {noformat} > or > {noformat} > 2017-10-09 17:00:53,085 WARN > org.apache.oozie.service.CallableQueueService$CompositeCallable: SERVER[HOST] > USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000011-170813033731256-oozie-oozi-W] > ACTION[0000011-170813033731256-oozie-oozi-W@sqoop_full_tbl_unload] exception > callable [action.check], E0604: Job does not exist [select w.statusStr from > WorkflowJobBean w where w.id = :id] > org.apache.oozie.command.CommandException: E0604: Job does not exist [select > w.statusStr from WorkflowJobBean w where w.id = :id] > at > org.apache.oozie.command.wf.ActionCheckXCommand.eagerLoadState(ActionCheckXCommand.java:97) > at org.apache.oozie.command.XCommand.call(XCommand.java:256) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.oozie.executor.jpa.JPAExecutorException: E0604: Job > does not exist [select w.statusStr from WorkflowJobBean w where w.id = :id] > at > org.apache.oozie.executor.jpa.WorkflowJobQueryExecutor.get(WorkflowJobQueryExecutor.java:345) > at > org.apache.oozie.executor.jpa.WorkflowJobQueryExecutor.get(WorkflowJobQueryExecutor.java:38) > at > org.apache.oozie.command.wf.ActionCheckXCommand.eagerLoadState(ActionCheckXCommand.java:90) > {noformat} > Solution here is to remove any {{SLACalculatorMemory#slaMap}} entries that > are causing those {{JPAExecutorException}} instances after the first > unsuccessful run, to not cause huge logfiles. The items to be logged don't > exist anymore, anyways. -- This message was sent by Atlassian JIRA (v7.6.3#76005)