koert kuipers created SPARK-31726:
-------------------------------------
Summary: Make spark.files available in driver with cluster deploy
mode on kubernetes
Key: SPARK-31726
URL: https://issues.apache.org/jira/browse/SPARK-31726
Project: Spark
Issue Type: Improvement
Components: Kubernetes
Affects Versions: 3.0.0
Reporter: koert kuipers
currently on yarn with cluster deploy mode --files makes the files available
for driver and executors and also put them on classpath for driver and
executors.
on k8s with cluster deploy mode --files makes the files available on executors
but they are not on classpath. it does not make the files available on driver
and they are not on driver classpath.
it would be nice if the k8s behavior was consistent with yarn, or at least
makes the files available on driver. once the files are available there is a
simple workaround to get them on classpath using
spark.driver.extraClassPath="./"
background:
we recently started testing kubernetes for spark. our main platform is yarn on
which we use client deploy mode. our first experience was that client deploy
mode was difficult to use on k8s (we dont launch from inside a pod). so we
switched to cluster deploy mode, which seems to behave well on k8s. but then we
realized that our program rely on reading files on classpath (application.conf,
log4j.properties etc.) that are on the client but now are no longer on the
driver (since driver is no longer on client). an easy fix for this seems to be
to ship the files using --files to make them available on driver, but we could
not get this to work.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]