On Thu, May 5, 2016 at 2:57 PM, Sourav Mazumder <[email protected] > wrote:
> Hi Gino, > > Thanks for the details. > > But I'm not able to see the image - it is coming as inline image. > > Could you please send the image once more ? > > Regards, > Sourav > > On Thu, May 5, 2016 at 12:44 PM, Gino Bustelo <[email protected]> wrote: > > > Sourav, > > > > The solution will look something like this picture > > > > [image: Inline image 1] > > > > There is no need for a separate Toree client if you are using Jupyter. > > Jupyter already knows how to talk to Toree. Now... there are other > > solutions that can sit on top of Toree that can expose REST or web > socket, > > but are currently meant for custom client solutions. See > > https://github.com/jupyter/kernel_gateway. > > > > Thanks, > > Gino > > > > On Thu, May 5, 2016 at 11:46 AM, Sourav Mazumder < > > [email protected]> wrote: > > > >> Hi Gino, > >> > >> Thanks for explaining the scope of Toree. > >> > >> What I was looking for is a solution where Toree can play the role of a > >> facade between the client application (in this case the notebook) and > the > >> underlying Spark cluster. So if the client application submit a command > it > >> can accept it and execute it using underlying spark infrastructure (may > be > >> stand alone, on mesos, or on YARN) and return back the result. > >> > >> I someway like the option 2 too as I think it is in the similar line of > my > >> requirement. However, not sure whether I have got it fully. > >> > >> What essentially I'm looking for is a solution where the Jupyter would > be > >> running on individual data scientists' laptop. The Jupyter will issue > the > >> command from the laptop and the Toree client will accept it and send it > to > >> the Toree server running on the Spark Cluster. Toree server will run > that > >> on Spark and return the results back. > >> > >> To achieve this requirement using option 2, can one potentially change > >> Jupyter (or add an extension) which can send the request to Toree > running > >> on the provision layer over Zero MQ (or any other protocol like REST) ? > >> > >> Regards, > >> Sourav > >> > >> On Thu, May 5, 2016 at 6:47 AM, Gino Bustelo <[email protected]> wrote: > >> > >> > >>>>>>>>>>>>>>>>>>> > >> > Hi Gino, > >> > > >> > It does not solve the problem of running a Spark job (on Yarn) > remotely > >> > from a Jupyter notebook which is running on say in a laptop/some > >> machine. > >> > > >> > The issue is in yarn-client mode the laptop needs to get access to all > >> the > >> > slave nodes where the executors would be running. In a typical > security > >> > scenario of an organization the slave nodes are behind firewall and > >> cannot > >> > be accessed from any random machine outside. > >> > > >> > Regards, > >> > Sourav > >> > >>>>>>>>>>>>>>>>>>> > >> > > >> > > >> > Sourav, I'm very much aware about the network implication of Spark > (not > >> > exclusive to YARN). The typical way that I've seen this problem solved > >> is: > >> > > >> > 1. You manages/host Jupyter in a privilege network space that can have > >> > access to the Spark cluster. This involves no code changes on either > >> > Jupyter or Toree, but has the added cost for the service provider of > >> > managing this frontend tool > >> > > >> > 2. You create a provisioner layer in a privilege network space to > manage > >> > Kernels (Toree) and modify Jupyter through extensions to understand > how > >> to > >> > communicate with that provisioner layer. The pro of this is that you > >> don't > >> > have to manage the Notebooks, but the service provider still need to > >> build > >> > that provisioning layer and proxy the Kernels communication channels. > >> > > >> > My preference is for #2. I think that frontend tools do not need to > live > >> > close to Spark, but processes like Toree should be as close to the > >> compute > >> > cluster as possible. > >> > > >> > Toree's scope is to be a Spark Driver program that allows "interactive > >> > computing". It is not it's scope to provide a full fledge > >> > provisioning/hosting solution to access Spark. That is left to the > >> > implementers of Spark offerings to select the best way to manage Toree > >> > kernels (i.e. Yarn, Mesos, Docker, etc...). > >> > > >> > Thanks, > >> > Gino > >> > > >> > On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]> > >> wrote: > >> > > >> > > This is not possible without extending Jupyter. By default, Jupyter > >> start > >> > > kernels as local processes. To be able to launch remote kernels you > >> need > >> > to > >> > > provide an extension to the KernelManager and have some sort of > kernel > >> > > provisioner to then manage the remote kernels. It is not something > >> hard > >> > to > >> > > do, but there is really nothing out there that I know of that you > can > >> use > >> > > out of the box. > >> > > > >> > > Gino B. > >> > > > >> > > > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder < > >> > > [email protected]> wrote: > >> > > > > >> > > > Hi, > >> > > > > >> > > > > >> > > > is there any documentation which can be user to configure a local > >> > Jupyter > >> > > > process to talk remotely to a remote Apache Toree server ? > >> > > > > >> > > > Regards, > >> > > > Sourav > >> > > > >> > > >> > > > > >
