Xiangrui Meng created SPARK-1861:
------------------------------------

             Summary: ArrayIndexOutOfBoundsException when reading bzip2 files
                 Key: SPARK-1861
                 URL: https://issues.apache.org/jira/browse/SPARK-1861
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 0.9.0, 1.0.0
            Reporter: Xiangrui Meng


Hadoop uses CBZip2InputStream to decode bzip2 files. However, the 
implementation is not threadsafe and Spark may run multiple tasks in the same 
JVM, which leads to this error. This is not a problem for Hadoop MapReduce 
because Hadoop runs each task in a separate JVM.

A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone 
cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to