Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Yinan Li Mon, 08 Oct 2018 09:36:33 -0700

> You can do this manually yourself via kubectl cp so it should be possible
to programmatically do this since it looks like this is just a tar piped
into a kubectl exec.   This would keep the relevant logic in the Kubernetes
specific client which may/may not be desirable depending on whether we’re
looking to just fix this for K8S or more generally.  Of course there is
probably a fair bit of complexity in making this work but does that sound
like something worth exploring?


Yes, kubectl cp is able to copy files from your local machine into a
container in a pod. However, the pod must be up and running for this to
work. So if you want to use this to upload dependencies to the driver pod,
the driver pod must already be up and running. So you may not even have a
chance to upload the dependencies at this point.

On Mon, Oct 8, 2018 at 6:36 AM Rob Vesse <rve...@dotnetrdf.org> wrote:

> Folks, thanks for all the great input. Responding to various points raised:
>
>
>
> Marcelo/Yinan/Felix –
>
>
>
> Yes, client mode will work.  The main JAR will be automatically
> distributed and --jars/--files specified dependencies are also distributed
> though for --files user code needs to use the appropriate Spark APIs to
> resolve the actual path i.e. SparkFiles.get()
>
>
>
> However client mode can be awkward if you want to mix spark-submit
> distribution with mounting dependencies via volumes since you may need to
> ensure that dependencies appear at the same path both on the local
> submission client and when mounted into the executors.  This mainly applies
> to the case where user code does not use SparkFiles.get() and simply tries
> to access the path directly.
>
>
>
> Marcelo/Stavros –
>
>
>
> Yes I did give the other resource managers too much credit.  From my past
> experience with Mesos and Standalone I had thought this wasn’t an issue but
> going back and looking at what we did for both of those it appears we were
> entirely reliant on the shared file system (whether HDFS, NFS or other
> POSIX compliant filesystems e.g. Lustre).
>
>
>
> Since connectivity back to the client is a potential stumbling block for
> cluster mode I wander if it would be better to think in reverse i.e. rather
> than having the driver pull from the client have the client push to the
> driver pod?
>
>
>
> You can do this manually yourself via kubectl cp so it should be possible
> to programmatically do this since it looks like this is just a tar piped
> into a kubectl exec.   This would keep the relevant logic in the Kubernetes
> specific client which may/may not be desirable depending on whether we’re
> looking to just fix this for K8S or more generally.  Of course there is
> probably a fair bit of complexity in making this work but does that sound
> like something worth exploring?
>
>
>
> I hadn’t really considered the HA aspect, a first step would be to get the
> basics working and then look at the HA aspect.  Although if the above
> theoretical approach is practical that could simply be part of restarting
> the driver.
>
>
>
> Rob
>
>
>
>
>
> *From: *Felix Cheung <felixcheun...@hotmail.com>
> *Date: *Sunday, 7 October 2018 at 23:00
> *To: *Yinan Li <liyinan...@gmail.com>, Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com>
> *Cc: *Rob Vesse <rve...@dotnetrdf.org>, dev <dev@spark.apache.org>
> *Subject: *Re: [DISCUSS][K8S] Local dependencies with Kubernetes
>
>
>
> Jars and libraries only accessible locally at the driver is fairly
> limited? Don’t you want the same on all executor?
>
>
>
>
>
>
> ------------------------------
>
> *From:* Yinan Li <liyinan...@gmail.com>
> *Sent:* Friday, October 5, 2018 11:25 AM
> *To:* Stavros Kontopoulos
> *Cc:* rve...@dotnetrdf.org; dev
> *Subject:* Re: [DISCUSS][K8S] Local dependencies with Kubernetes
>
>
>
> > Just to be clear: in client mode things work right? (Although I'm not
> really familiar with how client mode works in k8s - never tried it.)
>
>
>
> If the driver runs on the submission client machine, yes, it should just
> work. If the driver runs in a pod, however, it faces the same problem as in
> cluster mode.
>
>
>
> Yinan
>
>
>
> On Fri, Oct 5, 2018 at 11:06 AM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
> @Marcelo is correct. Mesos does not have something similar. Only Yarn does
> due to the distributed cache thing.
>
> I have described most of the above in the the jira also there are some
> other options.
>
>
>
> Best,
>
> Stavros
>
>
>
> On Fri, Oct 5, 2018 at 8:28 PM, Marcelo Vanzin <
> van...@cloudera.com.invalid> wrote:
>
> On Fri, Oct 5, 2018 at 7:54 AM Rob Vesse <rve...@dotnetrdf.org> wrote:
> > Ideally this would all just be handled automatically for users in the
> way that all other resource managers do
>
> I think you're giving other resource managers too much credit. In
> cluster mode, only YARN really distributes local dependencies, because
> YARN has that feature (its distributed cache) and Spark just uses it.
>
> Standalone doesn't do it (see SPARK-4160) and I don't remember seeing
> anything similar on the Mesos side.
>
> There are things that could be done; e.g. if you have HDFS you could
> do a restricted version of what YARN does (upload files to HDFS, and
> change the "spark.jars" and "spark.files" URLs to point to HDFS
> instead). Or you could turn the submission client into a file server
> that the cluster-mode driver downloads files from - although that
> requires connectivity from the driver back to the client.
>
> Neither is great, but better than not having that feature.
>
> Just to be clear: in client mode things work right? (Although I'm not
> really familiar with how client mode works in k8s - never tried it.)
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
>
>

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

Reply via email to