[ https://issues.apache.org/jira/browse/OOZIE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496423#comment-16496423 ]
Hadoop QA commented on OOZIE-3260: ---------------------------------- Testing JIRA OOZIE-3260 Cleaning local git workspace ---------------------------- {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} . {color:green}+1{color} the patch does not introduce any @author tags . {color:green}+1{color} the patch does not introduce any tabs . {color:green}+1{color} the patch does not introduce any trailing spaces . {color:green}+1{color} the patch does not introduce any line longer than 132 . {color:green}+1{color} the patch adds/modifies 2 testcase(s) {color:green}+1 RAT{color} . {color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} . {color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) . {color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) . {color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} . {color:green}+1{color} HEAD compiles . {color:green}+1{color} patch compiles . {color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} . {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . {color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} . Tests run: 2134 {color:green}+1 DISTRO{color} . {color:green}+1{color} distro tarball builds with the patch ---------------------------- {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/594/ > [sla] Remove stale item above max retries on JPA related errors from > in-memory SLA map > -------------------------------------------------------------------------------------- > > Key: OOZIE-3260 > URL: https://issues.apache.org/jira/browse/OOZIE-3260 > Project: Oozie > Issue Type: Bug > Components: coordinator, core, workflow > Affects Versions: 5.0.0 > Reporter: Andras Piros > Assignee: Andras Piros > Priority: Major > Attachments: OOZIE-3260.001.patch > > > Despite having implemented OOZIE-3134, there are still cases where > {{SLACalculatorMemory#slaMap}} and database contents still get out of sync. > Some possibilities including but not limited to: > * database contents of {{SLA_SUMMARY}} table have been purged manually from > DB > * no corresponding {{WF_JOBS}} or {{COORD_JOBS}} entries exist anymore in DB > * the {{WF_JOBS}} or {{COORD_JOBS}} instance that is being tracked by the > {{SLACalcStatus}} instances inside {{SLACalculatorMemory#slaMap}} is not yet > persisted to database when the SLA entry is already processed by > {{SLACalculatorMemory.HistoryPurgeWorker}}. Depending on e.g. how many > coordinator actions are being materialized, it can very well happen that > {{SLACalcStatus}} entries inserted to the in-memory map will be processed > before their corresponding {{CoordActionBean}} entries are yet to be > persisted to database > In those rare cases, we see {{JPAExecutorException}} instances like: > {noformat} > 2017-10-09 17:00:18,185 DEBUG openjpa.jdbc.SQL: SERVER[HOST] <t 1527981517, > conn 1584126245> [0 ms] spent > 2017-10-09 17:00:18,185 ERROR org.apache.oozie.sla.SLACalculatorMemory: > SERVER[tplhc01c001.iuser.iroot.adidom.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[0000438-170916014916144-oozie-oozi-C@556] ACTION[-] Exception in SLA > processing for job [0000438-170916014916144-oozie-oozi-C@556] > org.apache.oozie.executor.jpa.JPAExecutorException: E0604: Job does not exist > [select w.eventProcessed from SLASummaryBean w where w.jobId = :id] > at > org.apache.oozie.executor.jpa.SLASummaryQueryExecutor.getSingleValue(SLASummaryQueryExecutor.java:161) > at > org.apache.oozie.sla.SLACalculatorMemory.updateJobSla(SLACalculatorMemory.java:480) > at > org.apache.oozie.sla.SLACalculatorMemory.updateAllSlaStatus(SLACalculatorMemory.java:601) > {noformat} > Solution here is to track the number of times the {{SLACalcStatus}} entry has > not been processed successfully, and when a preconfigured > {{oozie.sla.service.SLAService.maximum.retry.count}} is reached, remove any > {{SLACalculatorMemory#slaMap}} entries that are causing those > {{JPAExecutorException}} instances, to not cause huge logfiles. The items to > be logged don't exist, anyways. > It's still possible that multiple {{CoordActionBean}} instances being > inserted won't have {{SLACalcStatus}} entries inside > {{SLACalculatorMemory#slaMap}} by the time written to database, and thus, no > SLA will be tracked. In those rare cases, preconfigured maximum retry count > can be extended. > Note that current implementation of > [*{{SLACalculatorMemory#updateJobSla()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java#L238-L242] > already removes the stale {{SLACalcStatus}} entry. The new functionality > here is to introduce {{SLACalcStatus#retryCount}}, and extend the > {{JPAExecutorException}} {{ErrorCode}}s of interest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)