[ https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shanyu zhao updated LIVY-750: ----------------------------- Description: On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of finding and uploading pyspark archives if it is not local, there is no need for Livy to redundantly do so. was: On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so. > Livy uploads local pyspark archives to Yarn distributed cache > ------------------------------------------------------------- > > Key: LIVY-750 > URL: https://issues.apache.org/jira/browse/LIVY-750 > Project: Livy > Issue Type: Bug > Components: Server > Affects Versions: 0.6.0, 0.7.0 > Reporter: shanyu zhao > Priority: Major > Attachments: image-2020-02-16-13-19-40-645.png, > image-2020-02-16-13-19-59-591.png > > Time Spent: 10m > Remaining Estimate: 0h > > On Livy Server, even if we set pyspark archives to use local files: > {code:bash} > export > PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip > {code} > Livy still upload these local pyspark archives to Yarn distributed cache: > 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO > yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> > hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip > 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO > yarn.Client: Uploading resource > file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> > hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip > Note that this is after we fixed Spark code in SPARK-30845 to not always > upload local archives. > The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", > which will be added to Yarn distributed cache by Spark. Since spark-submit > already takes care of finding and uploading pyspark archives if it is not > local, there is no need for Livy to redundantly do so. -- This message was sent by Atlassian Jira (v8.3.4#803005)