[ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688740#comment-15688740
 ] 

Rui Li commented on HIVE-15168:
-------------------------------

[~zsombor.klara], thanks for the investigation. I also tried adding some sleep 
in the listener before changing the state to QUEUED, then the test fails 
consistently.
Based on that, I think we have two choices. One is to remove 
{{verify(listener).onJobQueued(handle)}} in the test. Because it's not 
guaranteed to be called. Seems we can keep 
{{verify(listener).onJobStarted(handle)}} - at least on the RemoteDriver side 
we're sending JobStarted and JobResult sequentially.
The other one is try to detect the missing state changes. E.g. if the current 
state is SENT and we're told to change to SUCCEEDED, then we must have missed 
QUEUED and STARTED. And we can notify the listeners of the missing state 
changes before we change to SUCCEEDED.
[~xuefuz] what's your opinion on this?

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> -----------------------------------------------------------
>
>                 Key: HIVE-15168
>                 URL: https://issues.apache.org/jira/browse/HIVE-15168
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>       // needed.
>       rpc.addListener(new 
> GenericFutureListener<io.netty.util.concurrent.Future<Void>>() {
>         @Override
>         public void operationComplete(io.netty.util.concurrent.Future<Void> 
> f) {
>           if (f.isSuccess()) {
>             handle.changeState(JobHandle.State.QUEUED);
>           } else if (!promise.isDone()) {
>             promise.setFailure(f.cause());
>           }
>         }
>       });
>       promise.addListener(new GenericFutureListener<Promise<T>>() {
>         @Override
>         public void operationComplete(Promise<T> p) {
>           if (jobId != null) {
>             jobs.remove(jobId);
>           }
>           if (p.isCancelled() && !rpc.isDone()) {
>             rpc.cancel(true);
>           }
>         }
>       });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to