Re: Calling Apache Toree from a remote Jupyter instance

Gino Bustelo Fri, 06 May 2016 06:35:23 -0700

On Thu, May 5, 2016 at 2:57 PM, Sourav Mazumder <[email protected]
> wrote:


> Hi Gino,
>
> Thanks for the details.
>
> But I'm not able to see the image - it is coming as inline image.
>
> Could you please send the image once more ?
>
> Regards,
> Sourav
>
> On Thu, May 5, 2016 at 12:44 PM, Gino Bustelo <[email protected]> wrote:
>
> > Sourav,
> >
> > The solution will look something like this picture
> >
> > [image: Inline image 1]
> >
> > There is no need for a separate Toree client if you are using Jupyter.
> > Jupyter already knows how to talk to Toree. Now... there are other
> > solutions that can sit on top of Toree that can expose REST or web
> socket,
> > but are currently meant for custom client solutions. See
> > https://github.com/jupyter/kernel_gateway.
> >
> > Thanks,
> > Gino
> >
> > On Thu, May 5, 2016 at 11:46 AM, Sourav Mazumder <
> > [email protected]> wrote:
> >
> >> Hi Gino,
> >>
> >> Thanks for explaining the scope of Toree.
> >>
> >> What I was looking for is a solution where Toree can play the role of a
> >> facade between the client application (in this case the notebook) and
> the
> >> underlying Spark cluster. So if the client application submit a command
> it
> >> can accept it and execute it using underlying spark infrastructure (may
> be
> >> stand alone, on mesos, or on YARN) and return back the result.
> >>
> >> I someway like the option 2 too as I think it is in the similar line of
> my
> >> requirement. However, not sure whether I have got it fully.
> >>
> >> What essentially I'm looking for is a solution where the Jupyter would
> be
> >> running on individual data scientists' laptop. The Jupyter will issue
> the
> >> command from the laptop and the Toree client will accept it and send it
> to
> >> the Toree server running on the Spark Cluster. Toree server will run
> that
> >> on Spark and return the results back.
> >>
> >> To achieve this requirement using option 2, can one potentially change
> >> Jupyter (or add an extension) which can send the request to Toree
> running
> >> on the provision layer over Zero MQ (or any other protocol like REST) ?
> >>
> >> Regards,
> >> Sourav
> >>
> >> On Thu, May 5, 2016 at 6:47 AM, Gino Bustelo <[email protected]> wrote:
> >>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > Hi Gino,
> >> >
> >> > It does not solve the problem of running a Spark job  (on Yarn)
> remotely
> >> > from a Jupyter notebook which is running on say in a laptop/some
> >> machine.
> >> >
> >> > The issue is in yarn-client mode the laptop needs to get access to all
> >> the
> >> > slave nodes where the executors would be running. In a typical
> security
> >> > scenario of an organization the slave nodes are behind firewall and
> >> cannot
> >> > be accessed from any random machine outside.
> >> >
> >> > Regards,
> >> > Sourav
> >> > >>>>>>>>>>>>>>>>>>>
> >> >
> >> >
> >> > Sourav, I'm very much aware about the network implication of Spark
> (not
> >> > exclusive to YARN). The typical way that I've seen this problem solved
> >> is:
> >> >
> >> > 1. You manages/host Jupyter in a privilege network space that can have
> >> > access to the Spark cluster. This involves no code changes on either
> >> > Jupyter or Toree, but has the added cost for the service provider of
> >> > managing this frontend tool
> >> >
> >> > 2. You create a provisioner layer in a privilege network space to
> manage
> >> > Kernels (Toree) and modify Jupyter through extensions to understand
> how
> >> to
> >> > communicate with that provisioner layer. The pro of this is that you
> >> don't
> >> > have to manage the Notebooks, but the service provider still need to
> >> build
> >> > that provisioning layer and proxy the Kernels communication channels.
> >> >
> >> > My preference is for #2. I think that frontend tools do not need to
> live
> >> > close to Spark, but processes like Toree should be as close to the
> >> compute
> >> > cluster as possible.
> >> >
> >> > Toree's scope is to be a Spark Driver program that allows "interactive
> >> > computing". It is not it's scope to provide a full fledge
> >> > provisioning/hosting solution to access Spark. That is left to the
> >> > implementers of Spark offerings to select the best way to manage Toree
> >> > kernels (i.e. Yarn, Mesos, Docker, etc...).
> >> >
> >> > Thanks,
> >> > Gino
> >> >
> >> > On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]>
> >> wrote:
> >> >
> >> > > This is not possible without extending Jupyter. By default, Jupyter
> >> start
> >> > > kernels as local processes. To be able to launch remote kernels you
> >> need
> >> > to
> >> > > provide an extension to the KernelManager and have some sort of
> kernel
> >> > > provisioner to then manage the remote kernels. It is not something
> >> hard
> >> > to
> >> > > do, but there is really nothing out there that I know of that you
> can
> >> use
> >> > > out of the box.
> >> > >
> >> > > Gino B.
> >> > >
> >> > > > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder <
> >> > > [email protected]> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > >
> >> > > > is there any documentation which can be user to configure a local
> >> > Jupyter
> >> > > > process to talk remotely to a remote Apache Toree server ?
> >> > > >
> >> > > > Regards,
> >> > > > Sourav
> >> > >
> >> >
> >>
> >
> >
>

Re: Calling Apache Toree from a remote Jupyter instance

Reply via email to