Hi Gino,

Thanks for the details.

But I'm not able to see the image - it is coming as inline image.

Could you please send the image once more ?

Regards,
Sourav

On Thu, May 5, 2016 at 12:44 PM, Gino Bustelo <[email protected]> wrote:

> Sourav,
>
> The solution will look something like this picture
>
> [image: Inline image 1]
>
> There is no need for a separate Toree client if you are using Jupyter.
> Jupyter already knows how to talk to Toree. Now... there are other
> solutions that can sit on top of Toree that can expose REST or web socket,
> but are currently meant for custom client solutions. See
> https://github.com/jupyter/kernel_gateway.
>
> Thanks,
> Gino
>
> On Thu, May 5, 2016 at 11:46 AM, Sourav Mazumder <
> [email protected]> wrote:
>
>> Hi Gino,
>>
>> Thanks for explaining the scope of Toree.
>>
>> What I was looking for is a solution where Toree can play the role of a
>> facade between the client application (in this case the notebook) and the
>> underlying Spark cluster. So if the client application submit a command it
>> can accept it and execute it using underlying spark infrastructure (may be
>> stand alone, on mesos, or on YARN) and return back the result.
>>
>> I someway like the option 2 too as I think it is in the similar line of my
>> requirement. However, not sure whether I have got it fully.
>>
>> What essentially I'm looking for is a solution where the Jupyter would be
>> running on individual data scientists' laptop. The Jupyter will issue the
>> command from the laptop and the Toree client will accept it and send it to
>> the Toree server running on the Spark Cluster. Toree server will run that
>> on Spark and return the results back.
>>
>> To achieve this requirement using option 2, can one potentially change
>> Jupyter (or add an extension) which can send the request to Toree running
>> on the provision layer over Zero MQ (or any other protocol like REST) ?
>>
>> Regards,
>> Sourav
>>
>> On Thu, May 5, 2016 at 6:47 AM, Gino Bustelo <[email protected]> wrote:
>>
>> > >>>>>>>>>>>>>>>>>>>
>> > Hi Gino,
>> >
>> > It does not solve the problem of running a Spark job  (on Yarn) remotely
>> > from a Jupyter notebook which is running on say in a laptop/some
>> machine.
>> >
>> > The issue is in yarn-client mode the laptop needs to get access to all
>> the
>> > slave nodes where the executors would be running. In a typical security
>> > scenario of an organization the slave nodes are behind firewall and
>> cannot
>> > be accessed from any random machine outside.
>> >
>> > Regards,
>> > Sourav
>> > >>>>>>>>>>>>>>>>>>>
>> >
>> >
>> > Sourav, I'm very much aware about the network implication of Spark (not
>> > exclusive to YARN). The typical way that I've seen this problem solved
>> is:
>> >
>> > 1. You manages/host Jupyter in a privilege network space that can have
>> > access to the Spark cluster. This involves no code changes on either
>> > Jupyter or Toree, but has the added cost for the service provider of
>> > managing this frontend tool
>> >
>> > 2. You create a provisioner layer in a privilege network space to manage
>> > Kernels (Toree) and modify Jupyter through extensions to understand how
>> to
>> > communicate with that provisioner layer. The pro of this is that you
>> don't
>> > have to manage the Notebooks, but the service provider still need to
>> build
>> > that provisioning layer and proxy the Kernels communication channels.
>> >
>> > My preference is for #2. I think that frontend tools do not need to live
>> > close to Spark, but processes like Toree should be as close to the
>> compute
>> > cluster as possible.
>> >
>> > Toree's scope is to be a Spark Driver program that allows "interactive
>> > computing". It is not it's scope to provide a full fledge
>> > provisioning/hosting solution to access Spark. That is left to the
>> > implementers of Spark offerings to select the best way to manage Toree
>> > kernels (i.e. Yarn, Mesos, Docker, etc...).
>> >
>> > Thanks,
>> > Gino
>> >
>> > On Sat, Apr 30, 2016 at 9:53 PM, Gino Bustelo <[email protected]>
>> wrote:
>> >
>> > > This is not possible without extending Jupyter. By default, Jupyter
>> start
>> > > kernels as local processes. To be able to launch remote kernels you
>> need
>> > to
>> > > provide an extension to the KernelManager and have some sort of kernel
>> > > provisioner to then manage the remote kernels. It is not something
>> hard
>> > to
>> > > do, but there is really nothing out there that I know of that you can
>> use
>> > > out of the box.
>> > >
>> > > Gino B.
>> > >
>> > > > On Apr 30, 2016, at 6:25 PM, Sourav Mazumder <
>> > > [email protected]> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > >
>> > > > is there any documentation which can be user to configure a local
>> > Jupyter
>> > > > process to talk remotely to a remote Apache Toree server ?
>> > > >
>> > > > Regards,
>> > > > Sourav
>> > >
>> >
>>
>
>

Reply via email to