[ https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687491#comment-15687491 ]
Barna Zsombor Klara commented on HIVE-15168: -------------------------------------------- I had another look at this and you are correct, the listeners are being called even if the task has completed. Unfortunately that isn't enough for us because in the JobHandleImpl the state change will not reset the state of a job to queued if it is already cancelled/succeeded/failed. And if we never set the state to queued then the onJobQueued method will never be called on the JobHandleListener. {code} /** * Changes the state of this job handle, making sure that illegal state transitions are ignored. * Fires events appropriately. * * As a rule, state transitions can only occur if the current state is "higher" than the current * state (i.e., has a higher ordinal number) and is not a "final" state. "Final" states are * CANCELLED, FAILED and SUCCEEDED, defined here in the code as having an ordinal number higher * than the CANCELLED enum constant. */ boolean changeState(State newState) { synchronized (listeners) { if (newState.ordinal() > state.ordinal() && state.ordinal() < State.CANCELLED.ordinal()) { state = newState; for (Listener<T> listener : listeners) { fireStateChange(newState, listener); } return true; } return false; } } {code} I think that this code is correct and should not be changed. Once a job has transitioned to a terminal state we should not revert it to queued. But it also means that we must ensure that the state changes happen sequentially. > Flaky test: TestSparkClient.testJobSubmission (still flaky) > ----------------------------------------------------------- > > Key: HIVE-15168 > URL: https://issues.apache.org/jira/browse/HIVE-15168 > Project: Hive > Issue Type: Sub-task > Reporter: Barna Zsombor Klara > Assignee: Barna Zsombor Klara > Fix For: 2.2.0 > > Attachments: HIVE-15168.patch > > > [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already > addressed one source of flakyness bud sadly not all it seems. > In JobHandleImpl the listeners are registered after the job has been > submitted. > This may end up in a racecondition. > {code} > // Link the RPC and the promise so that events from one are propagated to > the other as > // needed. > rpc.addListener(new > GenericFutureListener<io.netty.util.concurrent.Future<Void>>() { > @Override > public void operationComplete(io.netty.util.concurrent.Future<Void> > f) { > if (f.isSuccess()) { > handle.changeState(JobHandle.State.QUEUED); > } else if (!promise.isDone()) { > promise.setFailure(f.cause()); > } > } > }); > promise.addListener(new GenericFutureListener<Promise<T>>() { > @Override > public void operationComplete(Promise<T> p) { > if (jobId != null) { > jobs.remove(jobId); > } > if (p.isCancelled() && !rpc.isDone()) { > rpc.cancel(true); > } > } > }); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)