[ 
https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621481#comment-15621481
 ] 

Rui Li commented on HIVE-15054:
-------------------------------

Thanks [~aihuaxu] for the investigation and update!
The patch looks good. But I find the comments a little bit confusing. How about 
something like this
{code}
// Hive requires this TaskAttemptId to be unique. MR's TaskAttemptId is 
composed of "attempt_timestamp_jobNum_m/r_taskNum_attemptNum". The counterpart 
for Spark should be "attempt_timestamp_stageNum_m/r_partitionId_attemptNum". 
When there're multiple attempts for a task, Hive will rely on the partitionId 
to figure out if the data are duplicate or not (see 
org.apache.hadoop.hive.ql.exec.Utils.removeTempOrDuplicateFiles)  when 
collecting the final outputs
{code}

> Hive insertion query execution fails on Hive on Spark
> -----------------------------------------------------
>
>                 Key: HIVE-15054
>                 URL: https://issues.apache.org/jira/browse/HIVE-15054
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>         Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, 
> HIVE-15054.3.patch
>
>
> The query of {{insert overwrite table tbl1}} sometimes will fail with the 
> following errors. Seems we are constructing taskAttemptId with partitionId 
> which is not unique if there are multiple attempts.
> {noformat}
> ava.lang.IllegalStateException: Hit error while closing operators - failing 
> tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0
>  to: 
> hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to