[ 
https://issues.apache.org/jira/browse/SPARK-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5209.
------------------------------
    Resolution: Not A Problem

> Jobs fail with "unexpected value" exception in certain environments
> -------------------------------------------------------------------
>
>                 Key: SPARK-5209
>                 URL: https://issues.apache.org/jira/browse/SPARK-5209
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Amazon Elastic Map Reduce
>            Reporter: Sven Krasser
>         Attachments: driver_log.txt, exec_log.txt, gen_test_data.py, 
> repro.py, spark-defaults.conf
>
>
> Jobs fail consistently and reproducibly with exceptions of the following type 
> in PySpark using Spark 1.2.0:
> {noformat}
> 2015-01-13 00:14:05,898 ERROR [Executor task launch worker-1] 
> executor.Executor (Logging.scala:logError(96)) - Exception in task 27.0 in 
> stage 0.0 (TID 28)
> org.apache.spark.SparkException: PairwiseRDD: unexpected value: 
> List([B@4c09f3e0)
> {noformat}
> The issue appeared the first time in Spark 1.2.0 and is sensitive to the 
> environment (configuration, cluster size), i.e. some changes to the 
> environment will cause the error to not occur.
> The following steps yield a reproduction on Amazon Elastic Map Reduce. Launch 
> an EMR cluster with the following parameters (this will bootstrap Spark 1.2.0 
> onto it):
> {code}
> aws emr create-cluster --region us-west-1 --no-auto-terminate \
>    --ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
>    --bootstrap-actions 
> Path=s3://support.elasticmapreduce/spark/install-spark,Args='["-g","-v","1.2.0.a"]'
>  \
>    --ami-version 3.3 --instance-groups 
> InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
>    InstanceGroupType=CORE,InstanceCount=3,InstanceType=r3.xlarge --name 
> "Spark Issue Repro" \
>    --visible-to-all-users --applications Name=Ganglia
> {code}
> Next, copy the attached {{spark-defaults.conf}} to {{~/spark/conf/}}.
> Run {{~/spark/bin/spark-submit gen_test_data.py}} to generate a test data set 
> on HDFS. Then lastly run {{~/spark/bin/spark-submit repro.py}} to reproduce 
> the error.
> Driver and executor logs are attached. For reference, a spark-user thread on 
> the topic is here: 
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%[email protected]%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to