[ 
https://issues.apache.org/jira/browse/SPARK-24559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513036#comment-16513036
 ] 

Marcelo Vanzin commented on SPARK-24559:
----------------------------------------

{{\-\-archives}} is completely handled by YARN, so if there's anything to fix 
it will be on YARN's side.

> Some zip files passed with spark-submit --archives causing "invalid CEN 
> header" error
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-24559
>                 URL: https://issues.apache.org/jira/browse/SPARK-24559
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.2.0
>            Reporter: James Porritt
>            Priority: Major
>
> I'm encountering an error when submitting some zip files to spark-submit 
> using --archive that are over 2Gb and have the zip64 flag set.
> {{PYSPARK_PYTHON=./ROOT/myspark/bin/python 
> /usr/hdp/current/spark2-client/bin/spark-submit \}}
> {{ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./ROOT/myspark/bin/python \}}
> {{ --master=yarn \}}
> {{ --deploy-mode=cluster \}}
> {{ --driver-memory=4g \}}
> {{ --archives=myspark.zip#ROOT \}}
> {{ --num-executors=32 \}}
> {{ --packages com.databricks:spark-avro_2.11:4.0.0 \}}
> {{ foo.py}}
> (As a bit of background, I'm trying to prepare files using the trick of 
> zipping a conda environment and passing the zip file via --archives, as per: 
> https://community.hortonworks.com/articles/58418/running-pyspark-with-conda-env.html)
> myspark.zip is a zipped conda environment. It was created using python with 
> the zipfile pacakge. The files are stored without deflation and with the 
> zip64 flag set. foo.py is my application code. This normally works, but if 
> myspark.zip is greater than 2Gb and has the zip64 flag set I get:
> java.util.zip.ZipException: invalid CEN header (bad signature)
> There seems to be much written on the subject, and I was able to write Java 
> code that utilises the java.util.zip library that both does and doesn't 
> encounter this error for one of the problematic zip files.
> Spark compile info:
> {{Welcome to}}
> {{ ____ __}}
> {{ / __/__ ___ _____/ /__}}
> {{ _\ \/ _ \/ _ `/ __/ '_/}}
> {{ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.4.0-91}}
> {{ /_/}}
> {{Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_112}}
> {{Branch HEAD}}
> {{Compiled by user jenkins on 2018-01-04T10:41:05Z}}
> {{Revision a24017869f5450397136ee8b11be818e7cd3facb}}
> {{Url g...@github.com:hortonworks/spark2.git}}
> {{Type --help for more information.}}
> YARN logs on console after above command. I've tried both 
> --deploy-mode=cluster and --deploy-mode=client.
> {{18/06/13 16:00:22 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable}}
> {{18/06/13 16:00:23 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.}}
> {{18/06/13 16:00:23 INFO RMProxy: Connecting to ResourceManager at 
> myhost2.myfirm.com/10.87.11.17:8050}}
> {{18/06/13 16:00:23 INFO Client: Requesting a new application from cluster 
> with 6 NodeManagers}}
> {{18/06/13 16:00:23 INFO Client: Verifying our application has not requested 
> more than the maximum memory capability of the cluster (221184 MB per 
> container)}}
> {{18/06/13 16:00:23 INFO Client: Will allocate AM container, with 18022 MB 
> memory including 1638 MB overhead}}
> {{18/06/13 16:00:23 INFO Client: Setting up container launch context for our 
> AM}}
> {{18/06/13 16:00:23 INFO Client: Setting up the launch environment for our AM 
> container}}
> {{18/06/13 16:00:23 INFO Client: Preparing resources for our AM container}}
> {{18/06/13 16:00:24 INFO Client: Use hdfs cache file as spark.yarn.archive 
> for HDP, 
> hdfsCacheFile:hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz}}
> {{18/06/13 16:00:24 INFO Client: Source and destination file systems are the 
> same. Not copying 
> hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz}}
> {{18/06/13 16:00:24 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.databri}}
> {{cks_spark-avro_2.11-4.0.0.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.slf4j_slf4j-api-1.}}
> {{7.5.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.apache.avro_avro-}}
> {{1.7.6.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar 
> -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org}}
> {{.codehaus.jackson_jackson-core-asl-1.9.13.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
>  -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/o}}
> {{rg.codehaus.jackson_jackson-mapper-asl-1.9.13.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.tho}}
> {{ughtworks.paranamer_paranamer-2.3.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.xerial.s}}
> {{nappy_snappy-java-1.0.5.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.ap}}
> {{ache.commons_commons-compress-1.4.1.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.tukaani_xz-1.0.jar}}
> {{18/06/13 16:00:26 INFO Client: Source and destination file systems are the 
> same. Not copying hdfs:/user/myuser/release/alphagenspark.zip#ROOT}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/my/script/dir/spark/alphagen/foo.py -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/foo.py}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/pyspark.zip}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource 
> file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/py4j-0.10.4-src}}
> {{.zip}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar added 
> multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar added multiple 
> times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar added multiple 
> times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar 
> added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
>  added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar 
> added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar added 
> multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource 
> file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar 
> added multiple times to distributed cache.}}{{18/06/13 16:00:26 WARN Client: 
> Same path resource file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar added 
> multiple times to distributed cache.}}
> {{18/06/13 16:00:27 INFO Client: Uploading resource 
> file:/tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb/__spark_conf__4405623606341803690.zip
>  -> 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/__spark_conf__.zip}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing view acls to: myuser}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing modify acls to: myuser}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing view acls groups to:}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing modify acls groups to:}}
> {{18/06/13 16:00:27 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(myuser); groups 
> with view permissions: Set(); users with modify permissions: Set(myuser); 
> groups with modify permissions: Set()}}
> {{18/06/13 16:00:27 INFO Client: Submitting application 
> application_1528901858967_0019 to ResourceManager}}
> {{18/06/13 16:00:27 INFO YarnClientImpl: Submitted application 
> application_1528901858967_0019}}
> {{18/06/13 16:00:28 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:28 INFO Client:}}
> {{ client token: N/A}}
> {{ diagnostics: AM container is launched, waiting for AM container to 
> Register with RM}}
> {{ ApplicationMaster host: N/A}}
> {{ ApplicationMaster RPC port: -1}}
> {{ queue: default}}
> {{ start time: 1528923627110}}
> {{ final status: UNDEFINED}}
> {{ tracking URL: 
> http://myhost2.myfirm.com:8088/proxy/application_1528901858967_0019/}}
> {{ user: myuser}}
> {{18/06/13 16:00:29 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:30 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:31 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:32 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:33 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:34 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:35 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:36 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:37 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:38 INFO Client: Application report for 
> application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:39 INFO Client: Application report for 
> application_1528901858967_0019 (state: FAILED)}}
> {{18/06/13 16:00:39 INFO Client:}}
> {{ client token: N/A}}
> {{ diagnostics: Application application_1528901858967_0019 failed 2 times due 
> to AM Container for appattempt_1528901858967_0019_000002 exited with 
> exitCode: -1000}}
> {{For more detailed output, check the application tracking page: 
> http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019 
> Then click on links to logs of each attempt.}}
> {{Diagnostics: java.util.zip.ZipException: invalid CEN header (bad 
> signature)}}
> {{Failing this attempt. Failing the application.}}
> {{ ApplicationMaster host: N/A}}
> {{ ApplicationMaster RPC port: -1}}
> {{ queue: default}}
> {{ start time: 1528923627110}}
> {{ final status: FAILED}}
> {{ tracking URL: 
> http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019}}
> {{ user: myuser}}
> {{18/06/13 16:00:39 INFO Client: Deleted staging directory 
> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019}}
> {{Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1528901858967_0019 finished with failed status}}
> {{ at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)}}
> {{ at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)}}
> {{ at org.apache.spark.deploy.yarn.Client.main(Client.scala)}}
> {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{ at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{ at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{ at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)}}
> {{ at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)}}
> {{ at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)}}
> {{18/06/13 16:00:39 INFO ShutdownHookManager: Shutdown hook called}}
> {{18/06/13 16:00:39 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to