[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990955#comment-14990955
 ] 

Rui Li commented on HIVE-12229:
-------------------------------

If spark.files.overwrite is false and the user overwrites an already added jar 
(different in file contents), then spark will throw exception on this.
Yes user can set this at hive just as other spark properties. However, if the 
user already launches an RSC and later changes spark.files.overwrite, then 
we'll launch another RSC because the SparkConf has changed. And accordingly 
we'll launch new executors for which no jar has been added. In such a case, I 
think it's uncertain which jar will be eventually added/used on the executor. 
Therefore it's better if the user set spark.files.overwrite in advance and 
doesn't change it throughout the session.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12229
>                 URL: https://issues.apache.org/jira/browse/HIVE-12229
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Lifeng Wang
>            Assignee: Rui Li
>         Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>         at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
>         at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
>         at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>         at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
>         ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to