[
https://issues.apache.org/jira/browse/HIVE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237383#comment-16237383
]
Hive QA commented on HIVE-17908:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12895453/HIVE-17908.5.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11353 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
(batchId=62)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb]
(batchId=156)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
(batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi]
(batchId=111)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut
(batchId=206)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAmPoolInteractions
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanUserMapping
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAsyncSessionInitFailures
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testClusterFractions
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testDestroyAndReturn
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testQueueing
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReopen (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuse (batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithDifferentPool
(batchId=281)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithQueueing
(batchId=281)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints
(batchId=223)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7605/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7605/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7605/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12895453 - PreCommit-HIVE-Build
> LLAP External client not correctly handling killTask for pending requests
> -------------------------------------------------------------------------
>
> Key: HIVE-17908
> URL: https://issues.apache.org/jira/browse/HIVE-17908
> Project: Hive
> Issue Type: Bug
> Components: llap
> Reporter: Jason Dere
> Assignee: Jason Dere
> Priority: Major
> Attachments: HIVE-17908.1.patch, HIVE-17908.2.patch,
> HIVE-17908.3.patch, HIVE-17908.4.patch, HIVE-17908.5.patch
>
>
> Hitting "Timed out waiting for heartbeat for task ID" errors with the LLAP
> external client.
> HIVE-17393 fixed some of these errors, however it is also occurring because
> the client is not correctly handling the killTask notification when the
> request is accepted but still waiting for the first task heartbeat. In this
> situation the client should retry the request, similar to what the LLAP AM
> does. Current logic is ignoring the killTask in this situation, which results
> in a heartbeat timeout - no heartbeats are sent by LLAP because of the
> killTask notification.
> {noformat}
> 17/08/09 05:36:02 WARN TaskSetManager: Lost task 10.0 in stage 4.0 (TID 14,
> cn114-10.l42scl.hortonworks.com, executor 5): java.io.IOException: Received
> reader event error: Timed out waiting for heartbeat for task ID
> attempt_7739111832518812959_0005_0_00_000010_0
> at
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:178)
> at
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:50)
> at
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
> at
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException:
> LlapTaskUmbilicalExternalClient(attempt_7739111832518812959_0005_0_00_000010_0):
> Error while attempting to read chunk length
> at
> org.apache.hadoop.hive.llap.io.ChunkedInputStream.read(ChunkedInputStream.java:82)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.hasInput(LlapBaseRecordReader.java:267)
> at
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:142)
> ... 22 more
> Caused by: java.net.SocketException: Socket closed
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)