[
https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyorgy Gal updated LIVY-750:
----------------------------
Fix Version/s: 0.10.0
(was: 0.9.0)
This issue has been moved to the 0.10.0 release as part of a bulk update. If
you feel this is moved out inappropriately, feel free to provide justification
and reset the Fix Version to 0.9.0.
> Livy uploads local pyspark archives to Yarn distributed cache
> -------------------------------------------------------------
>
> Key: LIVY-750
> URL: https://issues.apache.org/jira/browse/LIVY-750
> Project: Livy
> Issue Type: Bug
> Components: Server
> Affects Versions: 0.6.0, 0.7.0
> Reporter: shanyu zhao
> Priority: Major
> Fix For: 0.10.0
>
> Attachments: image-2020-02-16-13-19-40-645.png,
> image-2020-02-16-13-19-59-591.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> On Livy Server, even if we set pyspark archives to use local files:
> {code:bash}
> export
> PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
> {code}
> Livy still upload these local pyspark archives to Yarn distributed cache:
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO
> yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip ->
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO
> yarn.Client: Uploading resource
> file:/opt/spark/python/lib/py4j-0.10.7-src.zip ->
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
> Note that this is after we fixed Spark code in SPARK-30845 to not always
> upload local archives.
> The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles",
> which will be added to Yarn distributed cache by Spark. Since spark-submit
> already takes care of finding and uploading pyspark archives if it is not
> local, there is no need for Livy to redundantly do so.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)