[
https://issues.apache.org/jira/browse/SPARK-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100248#comment-14100248
]
Patrick Wendell commented on SPARK-3008:
----------------------------------------
I believe this is a known issue - you need to compile with Java 6 in order for
PySpark to work on YARN. If you compile with Java 7 it will use a zip file
format that is not parseable. See SPARK-1520
> PySpark fails due to zipimport not able to load the assembly jar
> (/usr/bin/python: No module named pyspark)
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-3008
> URL: https://issues.apache.org/jira/browse/SPARK-3008
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Environment: Assemebly Jar
> target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.2.0.jar
> jar -tf
> assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.2.0.jar | wc
> -l
> 70441
> git sha commit ba28a8fcbc3ba432e7ea4d6f0b535450a6ec96c6
> Reporter: Jai Kumar Singh
> Labels: pyspark
>
> PySpark is not working. It fails because zipimport not able to import
> assembly jar because that contain more than 65536 files.
> Email chains in this regard are below
> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccamjob8kcgk0pqiogju6uokceyswcusw3xwd5wrs8ikpmgd2...@mail.gmail.com%3E
> https://mail.python.org/pipermail/python-list/2014-May/671353.html
> Is there any work around to bypass the issue ?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]