Sourav,

The solution will look something like this picture

[image: Inline image 1]

There is no need for a separate Toree client if you are using Jupyter.
Jupyter already knows how to talk to Toree. Now... there are other
solutions that can sit on top of Toree that can expose REST or web socket,
but are currently meant for custom client solutions. See
https://github.com/jupyter/kernel_gateway.

Thanks,
Gino

On Thu, May 5, 2016 at 11:46 AM, Sourav Mazumder <
[email protected]> wrote:

> Hi Gino,
>
> Thanks for explaining the scope of Toree.
>
> What I was looking for is a solution where Toree can play the role of a
> facade between the client application (in this case the notebook) and the
> underlying Spark cluster. So if the client application submit a command it
> can accept it and execute it using underlying spark infrastructure (may be
> stand alone, on mesos, or on YARN) and return back the result.
>
> I someway like the option 2 too as I think it is in the similar line of my
> requirement. However, not sure whether I have got it fully.
>
> What essentially I'm looking for is a solution where the Jupyter would be
> running on individual data scientists' laptop. The Jupyter will issue the
> command from the laptop and the Toree client will accept it and send it to
> the Toree server running on the Spark Cluster. Toree server will run that
> on Spark and return the results back.
>
> To achieve this requirement using option 2, can one potentially change
> Jupyter (or add an extension) which can send the request to Toree running
> on the provision layer over Zero MQ (or any other protocol like REST) ?
>
> Regards,
> Sourav
>
> On Thu, May 5, 2016 at 6:47 AM, Gino Bustelo <[email protected]> wrote:
>
> > >>>>>>>>>>>>>>>>>>>
> > Hi Gino,
> >
> > It does not solve the problem of running a Spark job  (on Yarn) remotely
> > from a Jupyter notebook which is running on say in a laptop/some machine.
> >
> > The issue is in yarn-client mode the laptop needs to get access to all
> the
> > slave nodes where the executors would be running. In a typical security
> > scenario of an organization the slave nodes are behind firewall and
> cannot
> > be accessed from any random machine outside.
> >
> > Regards,
> > Sourav
> > >>>>>>>>>>>>>>>>>>>
> >
> >
> > Sourav, I'm very much aware about the network implication of Spark (not
> > exclusive to YARN). The typical way that I've seen this problem solved
> is:
> >
> > 1. You manages/host Jupyter in a privilege network space that can have
> > access to the Spark cluster. This involves no code changes on either
> > Jupyter or Toree, but has the added cost for the service provider of
> > managing this frontend tool
> >
> > 2. You create a provisioner layer in a privilege network space to manage
> > Kernels (Toree) and modify Jupyter through extensions to understand how
> to
> > communicate with that provisioner layer. The pro of this is that you
> don't
> > have to manage the Notebooks, but the service provider still need to
> build
> > that provisioning layer and proxy the Kernels communication channels.
> >
> > My preference is for #2. I think that frontend tools do not need to live
> > close to Spark, but processes like Toree should be as close to the
> compute
> > cluster as possible.
> >
> > Toree's scope is to be a Spark Driver program that allows "interactive
> > computing". It is not it's scope to provide a full fledge
> > provisioning/hosting solution to access Spark. That is left to the
> > implementers of Spark offerings to select the best way to manage Toree
> > kernels (i.e. Yarn, Mesos, Docker, etc...).
> >
> > Thanks,
> > Gino
> >
> > On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]>
> wrote:
> >
> > > This is not possible without extending Jupyter. By default, Jupyter
> start
> > > kernels as local processes. To be able to launch remote kernels you
> need
> > to
> > > provide an extension to the KernelManager and have some sort of kernel
> > > provisioner to then manage the remote kernels. It is not something hard
> > to
> > > do, but there is really nothing out there that I know of that you can
> use
> > > out of the box.
> > >
> > > Gino B.
> > >
> > > > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder <
> > > [email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > >
> > > > is there any documentation which can be user to configure a local
> > Jupyter
> > > > process to talk remotely to a remote Apache Toree server ?
> > > >
> > > > Regards,
> > > > Sourav
> > >
> >
>

Reply via email to