[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

Barna Zsombor Klara (JIRA) Tue, 22 Nov 2016 10:28:18 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687491#comment-15687491
 ]


Barna Zsombor Klara commented on HIVE-15168:
--------------------------------------------

I had another look at this and you are correct, the listeners are being called 
even if the task has completed. Unfortunately that isn't enough for us because 
in the JobHandleImpl the state change will not reset the state of a job to 
queued if it is already cancelled/succeeded/failed. And if we never set the 
state to queued then the onJobQueued method will never be called on the 
JobHandleListener.
{code}
 /**
   * Changes the state of this job handle, making sure that illegal state 
transitions are ignored.
   * Fires events appropriately.
   *
   * As a rule, state transitions can only occur if the current state is 
"higher" than the current
   * state (i.e., has a higher ordinal number) and is not a "final" state. 
"Final" states are
   * CANCELLED, FAILED and SUCCEEDED, defined here in the code as having an 
ordinal number higher
   * than the CANCELLED enum constant.
   */
  boolean changeState(State newState) {
    synchronized (listeners) {
      if (newState.ordinal() > state.ordinal() && state.ordinal() < 
State.CANCELLED.ordinal()) {
        state = newState;
        for (Listener<T> listener : listeners) {
          fireStateChange(newState, listener);
        }
        return true;
      }
      return false;
    }
  }
{code}

I think that this code is correct and should not be changed. Once a job has 
transitioned to a terminal state we should not revert it to queued. But it also 
means that we must ensure that the state changes happen sequentially.

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> -----------------------------------------------------------
>
>                 Key: HIVE-15168
>                 URL: https://issues.apache.org/jira/browse/HIVE-15168
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>       // needed.
>       rpc.addListener(new 
> GenericFutureListener<io.netty.util.concurrent.Future<Void>>() {
>         @Override
>         public void operationComplete(io.netty.util.concurrent.Future<Void> 
> f) {
>           if (f.isSuccess()) {
>             handle.changeState(JobHandle.State.QUEUED);
>           } else if (!promise.isDone()) {
>             promise.setFailure(f.cause());
>           }
>         }
>       });
>       promise.addListener(new GenericFutureListener<Promise<T>>() {
>         @Override
>         public void operationComplete(Promise<T> p) {
>           if (jobId != null) {
>             jobs.remove(jobId);
>           }
>           if (p.isCancelled() && !rpc.isDone()) {
>             rpc.cancel(true);
>           }
>         }
>       });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

Reply via email to