Re: Kubernetes: why use init containers?

Anirudh Ramanathan Fri, 12 Jan 2018 13:53:38 -0800

I'd like to discuss the criteria here for graduating from experimental
status. (as a fork, we were mentioned in the documentation as
experimental).


As I understand, the bigger change discussed here are like the init
containers, which will be more on the implementation side than a user
facing change/behavioral change - which is why it seemed okay to pursue it
post 2.3 as well.

If the reasoning is mostly lack of confidence in testing or documentation,
we are working on that, but we would love to have more visibility into what
we're missing, so, we can prioritize and augment that, and maybe even get
there by this release - or at least have a clear path ahead to graduating
in the next one.



On Jan 12, 2018 1:18 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

> BTW I most probably will not have time to get back to this at any time
> soon, so if anyone is interested in doing some clean up, I'll leave my
> branch up.
>
> I'm seriously thinking about proposing that we document the k8s
> backend as experimental in 2.3; it seems there still a lot to be
> cleaned up in terms of user interface (as in extensibility and
> customizability), documentation, and mainly testing, and we're pretty
> far into the 2.3 cycle for all of those to be sorted out.
>
> On Thu, Jan 11, 2018 at 8:19 AM, Anirudh Ramanathan
> <ramanath...@google.com> wrote:
> > If we can separate concerns those out, that might make sense in the short
> > term IMO.
> > There are several benefits to reusing spark-submit and spark-class as you
> > pointed out previously,
> > so, we should be looking to leverage those irrespective of how we do
> > dependency management -
> > in the interest of conformance with the other cluster managers.
> >
> > I like the idea of passing arguments through in a way that it doesn't
> > trigger the dependency management code for now.
> > In the interest of time for 2.3, if we could target the just that (and
> > revisit the init containers afterwards),
> > there should be enough time to make the change, test and release with
> > confidence.
> >
> > On Wed, Jan 10, 2018 at 3:45 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>
> >> On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan
> >> <ramanath...@google.com> wrote:
> >> > We can start by getting a PR going perhaps, and start augmenting the
> >> > integration testing to ensure that there are no surprises -
> with/without
> >> > credentials, accessing GCS, S3 etc as well.
> >> > When we get enough confidence and test coverage, let's merge this in.
> >> > Does that sound like a reasonable path forward?
> >>
> >> I think it's beneficial to separate this into two separate things as
> >> far as discussion goes:
> >>
> >> - using spark-submit: the code should definitely be starting the
> >> driver using spark-submit, and potentially the executor using
> >> spark-class.
> >>
> >> - separately, we can decide on whether to keep or remove init
> containers.
> >>
> >> Unfortunately, code-wise, those are not separate. If you get rid of
> >> init containers, my current p.o.c. has most of the needed changes
> >> (only lightly tested).
> >>
> >> But if you keep init containers, you'll need to mess with the
> >> configuration so that spark-submit never sees spark.jars /
> >> spark.files, so it doesn't trigger its dependency download code. (YARN
> >> does something similar, btw.) That will surely mean different changes
> >> in the current k8s code (which I wanted to double check anyway because
> >> I remember seeing some oddities related to those configs in the logs).
> >>
> >> To comment on one point made by Andrew:
> >> > there's almost a parallel here with spark.yarn.archive, where that
> >> > configures the cluster (YARN) to do distribution pre-runtime
> >>
> >> That's more of a parallel to the docker image; spark.yarn.archive
> >> points to a jar file with Spark jars in it so that YARN can make Spark
> >> available to the driver / executors running in the cluster.
> >>
> >> Like the docker image, you could include other stuff that is not
> >> really part of standard Spark in that archive too, or even not have
> >> Spark at all there, if you want things to just fail. :-)
> >>
> >> --
> >> Marcelo
> >
> >
> >
> >
> > --
> > Anirudh Ramanathan
>
>
>
> --
> Marcelo
>

Re: Kubernetes: why use init containers?

Reply via email to