Re: Calling Apache Toree from a remote Jupyter instance

Gino Bustelo Thu, 05 May 2016 06:49:14 -0700

>>>>>>>>>>>>>>>>>>>
Hi Gino,

It does not solve the problem of running a Spark job  (on Yarn) remotely
from a Jupyter notebook which is running on say in a laptop/some machine.

The issue is in yarn-client mode the laptop needs to get access to all the
slave nodes where the executors would be running. In a typical security
scenario of an organization the slave nodes are behind firewall and cannot
be accessed from any random machine outside.

Regards,
Sourav
>>>>>>>>>>>>>>>>>>>

Sourav, I'm very much aware about the network implication of Spark (not
exclusive to YARN). The typical way that I've seen this problem solved is:

1. You manages/host Jupyter in a privilege network space that can have
access to the Spark cluster. This involves no code changes on either
Jupyter or Toree, but has the added cost for the service provider of
managing this frontend tool

2. You create a provisioner layer in a privilege network space to manage
Kernels (Toree) and modify Jupyter through extensions to understand how to
communicate with that provisioner layer. The pro of this is that you don't
have to manage the Notebooks, but the service provider still need to build
that provisioning layer and proxy the Kernels communication channels.

My preference is for #2. I think that frontend tools do not need to live
close to Spark, but processes like Toree should be as close to the compute
cluster as possible.

Toree's scope is to be a Spark Driver program that allows "interactive
computing". It is not it's scope to provide a full fledge
provisioning/hosting solution to access Spark. That is left to the
implementers of Spark offerings to select the best way to manage Toree
kernels (i.e. Yarn, Mesos, Docker, etc...).

Thanks,
Gino

On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]> wrote:

> This is not possible without extending Jupyter. By default, Jupyter start
> kernels as local processes. To be able to launch remote kernels you need to
> provide an extension to the KernelManager and have some sort of kernel
> provisioner to then manage the remote kernels. It is not something hard to
> do, but there is really nothing out there that I know of that you can use
> out of the box.
>
> Gino B.
>
> > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder <
> [email protected]> wrote:
> >
> > Hi,
> >
> >
> > is there any documentation which can be user to configure a local Jupyter
> > process to talk remotely to a remote Apache Toree server ?
> >
> > Regards,
> > Sourav
>

Re: Calling Apache Toree from a remote Jupyter instance

Reply via email to