[
https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589004#comment-14589004
]
Harry Brundage commented on SPARK-7009:
---------------------------------------
We're still experiencing this issue with Pyspark on YARN with the recently
released 1.4 artifacts and are stuck using a forked version of Spark where
we've merged in Github pull #5637. Doesn't this seem rather severe to you
folks? You can't use PySpark on YARN with the official artifacts unless you use
hadoop-provided, which doesn't make sense in our case as a lot of users are
submitting jobs locally or from driver nodes that aren't part of the hadoop
cluster. [~joshrosen] do you have any ideas on this one?
Interestingly, merging in #5637 to the current apache/spark master also doesn't
seem to produce a python-importable JAR file, so something seems to have broken
there. I fought with the pom.xml for a while and solve an initial problem where
because the ant-run plugin is included for other reasons now the repackage
profile was running before the assembly had actually completed, but once they
are in the right order and the repackage completes successfully the jar file
still can't be imported. Another factoid is that my previously working artifact
from my previous spark version with #5637 merged in has 98018 files in it, and
the version with #5637 merged into master has 101473 files, which doesn't seem
like all that large a jump to break something. This seems odd to me and I am
not super confident I am doing everything correctly.
> Build assembly JAR via ant to avoid zip64 problems
> --------------------------------------------------
>
> Key: SPARK-7009
> URL: https://issues.apache.org/jira/browse/SPARK-7009
> Project: Spark
> Issue Type: Improvement
> Components: Build
> Affects Versions: 1.3.0
> Environment: Java 7+
> Reporter: Steve Loughran
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> SPARK-1911 shows the problem that JDK7+ is using zip64 to build large JARs; a
> format incompatible with Java and pyspark.
> Provided the total number of .class files+resources is <64K, ant can be used
> to make the final JAR instead, perhaps by unzipping the maven-generated JAR
> then rezipping it with zip64=never, before publishing the artifact via maven.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]