Xiangrui Meng created SPARK-1861:
------------------------------------
Summary: ArrayIndexOutOfBoundsException when reading bzip2 files
Key: SPARK-1861
URL: https://issues.apache.org/jira/browse/SPARK-1861
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 0.9.0, 1.0.0
Reporter: Xiangrui Meng
Hadoop uses CBZip2InputStream to decode bzip2 files. However, the
implementation is not threadsafe and Spark may run multiple tasks in the same
JVM, which leads to this error. This is not a problem for Hadoop MapReduce
because Hadoop runs each task in a separate JVM.
A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone
cluster.
--
This message was sent by Atlassian JIRA
(v6.2#6252)