This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 20fc6fa [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications 20fc6fa is described below commit 20fc6fa8398b9dc47b9ae7df52133a306f89b25f Author: Liang-Chi Hsieh <liang...@uber.com> AuthorDate: Tue Mar 31 18:08:55 2020 -0700 [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications ### What changes were proposed in this pull request? This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications. ### Why are the changes needed? For now, we add `pyFiles` argument to `files` argument only for Python applications, in SparkSubmit. Like the reason in #21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add `pyFiles` to `files` even it is not Python applications. ### Does this PR introduce any user-facing change? Yes. After this change, for non-PySpark applications, the Python files specified by `pyFiles` are also added to `files` like PySpark applications. ### How was this patch tested? Manually test on jupyter notebook or do `spark-submit` with `--verbose`. ``` Spark config: ... (spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py) (spark.submit.deployMode,client) (spark.master,local[*]) ``` Closes #28077 from viirya/pyfile. Lead-authored-by: Liang-Chi Hsieh <liang...@uber.com> Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index 4d67dfa..1271a3d 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -474,10 +474,12 @@ private[spark] class SparkSubmit extends Logging { args.mainClass = "org.apache.spark.deploy.PythonRunner" args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs } - if (clusterManager != YARN) { - // The YARN backend handles python files differently, so don't merge the lists. - args.files = mergeFileLists(args.files, args.pyFiles) - } + } + + // Non-PySpark applications can need Python dependencies. + if (deployMode == CLIENT && clusterManager != YARN) { + // The YARN backend handles python files differently, so don't merge the lists. + args.files = mergeFileLists(args.files, args.pyFiles) } if (localPyFiles != null) { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org