[
https://issues.apache.org/jira/browse/ASTERIXDB-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871679#comment-17871679
]
ASF subversion and git services commented on ASTERIXDB-3467:
------------------------------------------------------------
Commit eaebacfe7e42f0b82b0f1501a84dead9fd626ccf in asterixdb's branch
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=eaebacfe7e ]
[ASTERIXDB-3467][HYR] ConcurrentModificationException when picking new jobs to
run
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
When picking new jobs from the job queue, if a job cannot be
picked (e.g. due to cluster state), then collect those jobs
first instead of failing them and calling jobManager.prepareComplete()
one by one. Completing them one by one could lead to one job
calling pickJobsToRun() again and concurrently modifying
the job queue map.
Ext-ref: MB-62857
Change-Id: I6ec0c6625d9d84cd0797964781256e93f5346a91
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18512
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
Tested-by: Ali Alsuliman <[email protected]>
> ConcurrentModificationException when picking new jobs to run
> ------------------------------------------------------------
>
> Key: ASTERIXDB-3467
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3467
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: HYR - Hyracks
> Reporter: Ali Alsuliman
> Assignee: Ali Alsuliman
> Priority: Major
> Labels: triaged
>
> {noformat}
> 2024-07-23T05:48:20.111+00:00 WARN CBAS.work.WorkQueue
> [Worker:ClusterController] Exception while executing JobCleanup:
> JobId@JID:0.26 Status@FAILURE Exceptions@[HYR0010: Node
> 594730c817222e1318288c19ca889b8c does not exist]
> java.util.ConcurrentModificationException: null
> at
> java.base/java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:756)
> ~[?:?]
> at
> java.base/java.util.LinkedHashMap$LinkedValueIterator.next(LinkedHashMap.java:783)
> ~[?:?]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:88)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:186)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.scheduler.FIFOJobQueue.pull(FIFOJobQueue.java:107)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.pickJobsToRun(JobManager.java:385)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.finalComplete(JobManager.java:278)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.job.JobManager.prepareComplete(JobManager.java:205)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.cc.work.JobCleanupWork.run(JobCleanupWork.java:63)
> ~[hyracks-control-cc.jar:1.0.0-2230]
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> [hyracks-control-common.jar:1.0.0-2230]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)