Sven Krasser created SPARK-5209:
-----------------------------------

             Summary: Jobs fail with "unexpected value" exception in certain 
environments
                 Key: SPARK-5209
                 URL: https://issues.apache.org/jira/browse/SPARK-5209
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: Amazon Elastic Map Reduce
            Reporter: Sven Krasser


Jobs fail consistently and reproducibly with exceptions of the following type 
in PySpark using Spark 1.2.0:

{noformat}
2015-01-13 00:14:05,898 ERROR [Executor task launch worker-1] executor.Executor 
(Logging.scala:logError(96)) - Exception in task 27.0 in stage 0.0 (TID 28)
org.apache.spark.SparkException: PairwiseRDD: unexpected value: 
List([B@4c09f3e0)
{noformat}

The issue appeared the first time in Spark 1.2.0 and is sensitive to the 
environment (configuration, cluster size), i.e. some changes to the environment 
will cause the error to not occur.

The following steps yield a reproduction on Amazon Elastic Map Reduce. Launch 
an EMR cluster with the following parameters (this will bootstrap Spark 1.2.0 
onto it):
{code}
aws emr create-cluster --region us-west-1 --no-auto-terminate \
   --ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
   --bootstrap-actions 
Path=s3://support.elasticmapreduce/spark/install-spark,Args='["-g","-v","1.2.0.a"]'
 \
   --ami-version 3.3 --instance-groups 
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
   InstanceGroupType=CORE,InstanceCount=3,InstanceType=r3.xlarge --name "Spark 
Issue Repro" \
   --visible-to-all-users --applications Name=Ganglia
{code}

Next, copy the attached {{spark-defaults.conf}} to {{~/spark/conf/}}.

Run {{~/spark/bin/spark-submit gen_test_data.py}} to generate a test data set 
on HDFS. Then lastly run {{~/spark/bin/spark-submit repro.py}} to reproduce the 
error.

Driver and executor logs are attached. For reference, a spark-user thread on 
the topic is here: 
http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%[email protected]%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to