[ https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686364#comment-15686364 ]
Barna Zsombor Klara commented on HIVE-15168: -------------------------------------------- i'm not sure these failures are related. These are known flaky tests: explainanalyze_2 - https://issues.apache.org/jira/browse/HIVE-15084 transform_ppr2 - https://issues.apache.org/jira/browse/HIVE-15201 orc_ppd_schema_evol_3a - https://issues.apache.org/jira/browse/HIVE-14936 These are failing rather consistently: union_fast_stats - https://issues.apache.org/jira/browse/HIVE-15115 join_acid_non_acid - https://issues.apache.org/jira/browse/HIVE-15116 auto_sortmerge_join_2 - is not identified as flaky, but the failure is in MR, I don't think my changes to the SparkClient could have caused it. And it did not fail on the first run. {code} < FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask < ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} This leaves TestSparkCliDriver, which I cannot repro locally. >From the test report it seems to me that the error is happening during the >test setup: {code} java.lang.AssertionError: Failed during createSources processLine with code=3 {code} But the hive log has a different failure: {code} 2016-11-21T08:36:57,751 INFO [stderr-redir-1] client.SparkClientImpl: 16/11/21 08:36:57 WARN TaskSetManager: Lost task 0.0 in stage 46.0 (TID 53, 10.234.144.78): java.io.IOException: Failed to create local dir in /tmp/spark-8a7bd913-fca5-4990-ad09-c9eff4dacae0/executor-e85f9833-b0ab-47ed-bb95-56dc9ef64177/blockmgr-044ca916-76f1-402f-80ed-c4ec5fd4d544/2b. {code} I'm not sure if either can be related to the refactoring in the SparkClientImpl. > Flaky test: TestSparkClient.testJobSubmission (still flaky) > ----------------------------------------------------------- > > Key: HIVE-15168 > URL: https://issues.apache.org/jira/browse/HIVE-15168 > Project: Hive > Issue Type: Sub-task > Reporter: Barna Zsombor Klara > Assignee: Barna Zsombor Klara > Fix For: 2.2.0 > > Attachments: HIVE-15168.patch > > > [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already > addressed one source of flakyness bud sadly not all it seems. > In JobHandleImpl the listeners are registered after the job has been > submitted. > This may end up in a racecondition. > {code} > // Link the RPC and the promise so that events from one are propagated to > the other as > // needed. > rpc.addListener(new > GenericFutureListener<io.netty.util.concurrent.Future<Void>>() { > @Override > public void operationComplete(io.netty.util.concurrent.Future<Void> > f) { > if (f.isSuccess()) { > handle.changeState(JobHandle.State.QUEUED); > } else if (!promise.isDone()) { > promise.setFailure(f.cause()); > } > } > }); > promise.addListener(new GenericFutureListener<Promise<T>>() { > @Override > public void operationComplete(Promise<T> p) { > if (jobId != null) { > jobs.remove(jobId); > } > if (p.isCancelled() && !rpc.isDone()) { > rpc.cancel(true); > } > } > }); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)