[ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183172#comment-14183172
 ] 

Hive QA commented on HIVE-8457:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676940/HIVE-8457.2-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6809 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-260/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676940 - PreCommit-HIVE-SPARK-Build

> MapOperator initialization fails when multiple Spark threads is enabled 
> [Spark Branch]
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-8457
>                 URL: https://issues.apache.org/jira/browse/HIVE-8457
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao
>            Assignee: Chao
>         Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch
>
>
> Currently, on the Spark branch, each thread it is bound with a thread-local 
> IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
> and later used in {{MapOperator}}, {{FilterOperator}}, etc.
> And, given the introduction of HIVE-8118, we may have multiple downstream 
> RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
> {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
> A typical case would be like the following:
> {noformat}
>      inputRDD     inputRDD
>         |            |
>        MT_11        MT_12
>         |            |
>        RT_1         RT_2
> {noformat}
> Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
> and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
> simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
> {{ReduceTran}}.
> When multiple Spark threads are running, {{MT_11}} may be executed first, and 
> it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
> of the iterator, which in turn triggers the initialization of the 
> {{IOContext}} associated with that particular thread.
> *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
> for an iterator from the
> {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
> iterator, it will just fetch it from the cached result. However, *this will 
> skip the initialization of the IOContext associated with this particular 
> thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
> {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
> fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to