[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

Barna Zsombor Klara (JIRA) Tue, 22 Nov 2016 02:46:25 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686364#comment-15686364
 ]


Barna Zsombor Klara commented on HIVE-15168:
--------------------------------------------

i'm not sure these failures are related.
These are known flaky tests:
explainanalyze_2 - https://issues.apache.org/jira/browse/HIVE-15084
transform_ppr2 - https://issues.apache.org/jira/browse/HIVE-15201
orc_ppd_schema_evol_3a - https://issues.apache.org/jira/browse/HIVE-14936
These are failing rather consistently:
union_fast_stats - https://issues.apache.org/jira/browse/HIVE-15115
join_acid_non_acid - https://issues.apache.org/jira/browse/HIVE-15116

auto_sortmerge_join_2 - is not identified as flaky, but the failure is in MR, I 
don't think my changes to the SparkClient could have caused it. And it did not 
fail on the first run.
{code}
< FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
< ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.mr.MapRedTask
{code}

This leaves TestSparkCliDriver, which I cannot repro locally.
>From the test report it seems to me that the error is happening during the 
>test setup:
{code}
java.lang.AssertionError: Failed during createSources processLine with code=3
{code}
But the hive log has a different failure:
{code}
2016-11-21T08:36:57,751  INFO [stderr-redir-1] client.SparkClientImpl: 16/11/21 
08:36:57 WARN TaskSetManager: Lost task 0.0 in stage 46.0 (TID 53, 
10.234.144.78): java.io.IOException: Failed to create local dir in 
/tmp/spark-8a7bd913-fca5-4990-ad09-c9eff4dacae0/executor-e85f9833-b0ab-47ed-bb95-56dc9ef64177/blockmgr-044ca916-76f1-402f-80ed-c4ec5fd4d544/2b.
{code}
I'm not sure if either can be related to the refactoring in the SparkClientImpl.

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> -----------------------------------------------------------
>
>                 Key: HIVE-15168
>                 URL: https://issues.apache.org/jira/browse/HIVE-15168
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>       // needed.
>       rpc.addListener(new 
> GenericFutureListener<io.netty.util.concurrent.Future<Void>>() {
>         @Override
>         public void operationComplete(io.netty.util.concurrent.Future<Void> 
> f) {
>           if (f.isSuccess()) {
>             handle.changeState(JobHandle.State.QUEUED);
>           } else if (!promise.isDone()) {
>             promise.setFailure(f.cause());
>           }
>         }
>       });
>       promise.addListener(new GenericFutureListener<Promise<T>>() {
>         @Override
>         public void operationComplete(Promise<T> p) {
>           if (jobId != null) {
>             jobs.remove(jobId);
>           }
>           if (p.isCancelled() && !rpc.isDone()) {
>             rpc.cancel(true);
>           }
>         }
>       });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

Reply via email to