>>>>>>>>>>>>>>>>>>> Hi Gino,
It does not solve the problem of running a Spark job (on Yarn) remotely from a Jupyter notebook which is running on say in a laptop/some machine. The issue is in yarn-client mode the laptop needs to get access to all the slave nodes where the executors would be running. In a typical security scenario of an organization the slave nodes are behind firewall and cannot be accessed from any random machine outside. Regards, Sourav >>>>>>>>>>>>>>>>>>> Sourav, I'm very much aware about the network implication of Spark (not exclusive to YARN). The typical way that I've seen this problem solved is: 1. You manages/host Jupyter in a privilege network space that can have access to the Spark cluster. This involves no code changes on either Jupyter or Toree, but has the added cost for the service provider of managing this frontend tool 2. You create a provisioner layer in a privilege network space to manage Kernels (Toree) and modify Jupyter through extensions to understand how to communicate with that provisioner layer. The pro of this is that you don't have to manage the Notebooks, but the service provider still need to build that provisioning layer and proxy the Kernels communication channels. My preference is for #2. I think that frontend tools do not need to live close to Spark, but processes like Toree should be as close to the compute cluster as possible. Toree's scope is to be a Spark Driver program that allows "interactive computing". It is not it's scope to provide a full fledge provisioning/hosting solution to access Spark. That is left to the implementers of Spark offerings to select the best way to manage Toree kernels (i.e. Yarn, Mesos, Docker, etc...). Thanks, Gino On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]> wrote: > This is not possible without extending Jupyter. By default, Jupyter start > kernels as local processes. To be able to launch remote kernels you need to > provide an extension to the KernelManager and have some sort of kernel > provisioner to then manage the remote kernels. It is not something hard to > do, but there is really nothing out there that I know of that you can use > out of the box. > > Gino B. > > > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder < > [email protected]> wrote: > > > > Hi, > > > > > > is there any documentation which can be user to configure a local Jupyter > > process to talk remotely to a remote Apache Toree server ? > > > > Regards, > > Sourav >
