[
https://issues.apache.org/jira/browse/SPARK-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-5209.
------------------------------
Resolution: Not A Problem
> Jobs fail with "unexpected value" exception in certain environments
> -------------------------------------------------------------------
>
> Key: SPARK-5209
> URL: https://issues.apache.org/jira/browse/SPARK-5209
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.2.0
> Environment: Amazon Elastic Map Reduce
> Reporter: Sven Krasser
> Attachments: driver_log.txt, exec_log.txt, gen_test_data.py,
> repro.py, spark-defaults.conf
>
>
> Jobs fail consistently and reproducibly with exceptions of the following type
> in PySpark using Spark 1.2.0:
> {noformat}
> 2015-01-13 00:14:05,898 ERROR [Executor task launch worker-1]
> executor.Executor (Logging.scala:logError(96)) - Exception in task 27.0 in
> stage 0.0 (TID 28)
> org.apache.spark.SparkException: PairwiseRDD: unexpected value:
> List([B@4c09f3e0)
> {noformat}
> The issue appeared the first time in Spark 1.2.0 and is sensitive to the
> environment (configuration, cluster size), i.e. some changes to the
> environment will cause the error to not occur.
> The following steps yield a reproduction on Amazon Elastic Map Reduce. Launch
> an EMR cluster with the following parameters (this will bootstrap Spark 1.2.0
> onto it):
> {code}
> aws emr create-cluster --region us-west-1 --no-auto-terminate \
> --ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
> --bootstrap-actions
> Path=s3://support.elasticmapreduce/spark/install-spark,Args='["-g","-v","1.2.0.a"]'
> \
> --ami-version 3.3 --instance-groups
> InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
> InstanceGroupType=CORE,InstanceCount=3,InstanceType=r3.xlarge --name
> "Spark Issue Repro" \
> --visible-to-all-users --applications Name=Ganglia
> {code}
> Next, copy the attached {{spark-defaults.conf}} to {{~/spark/conf/}}.
> Run {{~/spark/bin/spark-submit gen_test_data.py}} to generate a test data set
> on HDFS. Then lastly run {{~/spark/bin/spark-submit repro.py}} to reproduce
> the error.
> Driver and executor logs are attached. For reference, a spark-user thread on
> the topic is here:
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%[email protected]%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]