[
https://issues.apache.org/jira/browse/OOZIE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503170#comment-16503170
]
Hadoop QA commented on OOZIE-3260:
----------------------------------
Testing JIRA OOZIE-3260
Cleaning local git workspace
----------------------------
{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
. {color:green}+1{color} the patch does not introduce any @author tags
. {color:green}+1{color} the patch does not introduce any tabs
. {color:green}+1{color} the patch does not introduce any trailing spaces
. {color:green}+1{color} the patch does not introduce any line longer than
132
. {color:green}+1{color} the patch adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
. {color:green}+1{color} the patch does not seem to introduce new RAT
warnings
{color:green}+1 JAVADOC{color}
{color:green}+1 JAVADOC{color}
. {color:green}+1{color} the patch does not seem to introduce new Javadoc
warning(s)
. {color:green}+1{color} the patch does not seem to introduce new Javadoc
error(s)
. {color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s)
{color:green}+1 COMPILE{color}
. {color:green}+1{color} HEAD compiles
. {color:green}+1{color} patch compiles
. {color:green}+1{color} the patch does not seem to introduce new javac
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [webapp].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in [server].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
. {color:green}+1{color} the patch does not change any JPA
Entity/Colum/Basic/Lob/Transient annotations
. {color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
. Tests run: 2148
. {color:orange}Tests failed at first run:{color}
TestCoordActionsKillXCommand#testActionKillCommandDate
. For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
. {color:green}+1{color} distro tarball builds with the patch
----------------------------
{color:green}*+1 Overall result, good!, no -1s*{color}
The full output of the test-patch run is available at
. https://builds.apache.org/job/PreCommit-OOZIE-Build/610/
> [sla] Remove stale item above max retries on JPA related errors from
> in-memory SLA map
> --------------------------------------------------------------------------------------
>
> Key: OOZIE-3260
> URL: https://issues.apache.org/jira/browse/OOZIE-3260
> Project: Oozie
> Issue Type: Bug
> Components: coordinator, core, workflow
> Affects Versions: 5.0.0
> Reporter: Andras Piros
> Assignee: Andras Piros
> Priority: Major
> Attachments: OOZIE-3260.001.patch, OOZIE-3260.002.patch
>
>
> Despite having implemented OOZIE-3134, there are still cases where
> {{SLACalculatorMemory#slaMap}} and database contents still get out of sync.
> Some possibilities including but not limited to:
> * database contents of {{SLA_SUMMARY}} table have been purged manually from
> DB
> * no corresponding {{WF_JOBS}} or {{COORD_JOBS}} entries exist anymore in DB
> * the {{WF_JOBS}} or {{COORD_JOBS}} instance that is being tracked by the
> {{SLACalcStatus}} instances inside {{SLACalculatorMemory#slaMap}} is not yet
> persisted to database when the SLA entry is already processed by
> {{SLACalculatorMemory.HistoryPurgeWorker}}. Depending on e.g. how many
> coordinator actions are being materialized, it can very well happen that
> {{SLACalcStatus}} entries inserted to the in-memory map will be processed
> before their corresponding {{CoordActionBean}} entries are yet to be
> persisted to database
> In those rare cases, we see {{JPAExecutorException}} instances like:
> {noformat}
> 2017-10-09 17:00:18,185 DEBUG openjpa.jdbc.SQL: SERVER[HOST] <t 1527981517,
> conn 1584126245> [0 ms] spent
> 2017-10-09 17:00:18,185 ERROR org.apache.oozie.sla.SLACalculatorMemory:
> SERVER[tplhc01c001.iuser.iroot.adidom.com] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[0000438-170916014916144-oozie-oozi-C@556] ACTION[-] Exception in SLA
> processing for job [0000438-170916014916144-oozie-oozi-C@556]
> org.apache.oozie.executor.jpa.JPAExecutorException: E0604: Job does not exist
> [select w.eventProcessed from SLASummaryBean w where w.jobId = :id]
> at
> org.apache.oozie.executor.jpa.SLASummaryQueryExecutor.getSingleValue(SLASummaryQueryExecutor.java:161)
> at
> org.apache.oozie.sla.SLACalculatorMemory.updateJobSla(SLACalculatorMemory.java:480)
> at
> org.apache.oozie.sla.SLACalculatorMemory.updateAllSlaStatus(SLACalculatorMemory.java:601)
> {noformat}
> Solution here is to track the number of times the {{SLACalcStatus}} entry has
> not been processed successfully, and when a preconfigured
> {{oozie.sla.service.SLAService.maximum.retry.count}} is reached, remove any
> {{SLACalculatorMemory#slaMap}} entries that are causing those
> {{JPAExecutorException}} instances, to not cause huge logfiles. The items to
> be logged don't exist, anyways.
> It's still possible that multiple {{CoordActionBean}} instances being
> inserted won't have {{SLACalcStatus}} entries inside
> {{SLACalculatorMemory#slaMap}} by the time written to database, and thus, no
> SLA will be tracked. In those rare cases, preconfigured maximum retry count
> can be extended.
> Note that current implementation of
> [*{{SLACalculatorMemory#updateJobSla()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java#L238-L242]
> already removes the stale {{SLACalcStatus}} entry. The new functionality
> here is to introduce {{SLACalcStatus#retryCount}}, and extend the
> {{JPAExecutorException}} {{ErrorCode}}s of interest.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)