Shivaram Venkataraman created SPARK-8311:
--------------------------------------------
Summary: saveAsTextFile with Hadoop1 could lead to errors
Key: SPARK-8311
URL: https://issues.apache.org/jira/browse/SPARK-8311
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.3.1
Reporter: Shivaram Venkataraman
I've run into this bug a couple of times and wanted to document things I have
found so far in a JIRA. From what I see if an application is linked to Hadoop1
and running on a Spark 1.3.1 + Hadoop1 cluster then the saveAsTextFile call
consistently fails with errors of the form
{code}
15/06/11 19:47:10 WARN scheduler.TaskSetManager: Lost task 3.0 in stage 3.0
(TID 13, ip-10-212-141-222.us-west-2.compute.internal):
java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:95)
at
org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:106)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1082)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
{code}
This does not happen in 1.2.1
I think the bug is caused by the following commit
https://github.com/apache/spark/commit/fde6945417355ae57500b67d034c9cad4f20d240
where we the function `commitTask` assumes that the mrTaskContext is always a
`mapreduce.TaskContext` while it is a `mapred.TaskContext` in Hadoop1. But
this is just a hypothesis as I haven't tried reverting this to see if the
problem goes away
cc [~liancheng]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]