Quick update: I'v started to hack around a PoC for this feature here
https://github.com/bzz/zeppelin-multiprocess-interpreter-poc , it's very
early stage but please feel free to check it out and provide any feedback.

On enabling\disabling this feature: if it will be easy to provide such
choice (which we are not sure yet) then there is no reason not to implement
4.


On Sun, Jan 18, 2015 at 9:46 AM, moon soo Lee <[email protected]> wrote:

> Alexander, thanks for interest to implementing this feature.
>
> I think there're some alternatives to enabling/disabling this feature
>
> 1) Run all interpreter in separate process
> 2) Let user select which interpreter will be run in separate process
> 3) Let interpreter choose, it is going to run in separate process or not.
> 4) Let user select, but interpreter provide default selection.
>
> What do you guys think? To me, 4) which gives user flexibility as well as
> simplicity.
>
>
> And i can easily think some possible improvements after the first step,
>
> a) Run interpreter process not only in local machine but on remote machine
> (will it be helpful for anything work with Yarn?)
> b) Option to keep separate process running when zeppelin terminates, so
> zeppelin can reconnect when it restarted.
> c) Implement remote interpreter in different language. (eg, pyspark)
>
> So, I think if IPC implementation can have a possibility to RPC and various
> language support, then it'll be better for future.
>
>
> Best,
> moon
>
>
>
> On Thu, Jan 15, 2015 at 8:58 PM, Alex B <[email protected]> wrote:
>
> > I think I'd like to volunteer to implement this feature.
> >
> > My the perspective is: we solve 2 immediate problems and at the end
> > have a maturing enough interpreter API to be able so add pyspark
> > support.
> >
> > Immediate problem we solve are:
> >  - A multiple interpreters running right now mix stdout/err
> >  - in case of JVMs there also is a  Classloader collision problem,
> > which does not allow SparkSQL to work with spark 1.2
> >
> > Suggested solution:
> > To separate each interpreter to a it's own process.
> >
> > This means bringing to the codebase things like:
> >  - API for managing the runtime state of that process
> >  - then IPC implementation itself (thrift?)
> >  - basic ClassLoading for JVM based interpreters
> >
> > Please, let me know if there is something I have missed here!
> >
> > --
> > Kind regards,
> > Alexander
> >
> > > On 13 Jan 2015, at 20:26, moon soo Lee <[email protected]> wrote:
> > >
> > > Hi guys,
> > >
> > > I'm bringing an issue https://github.com/NFLabs/zeppelin/issues/278 to
> > this
> > > mailing list for discussion.
> > >
> > > Zeppelin creates interpreter instance with each separate classloader to
> > > avoid interfere(dependency conflictions, singletons, static members)
> with
> > > other interpreter instance. It was working well until now but i can see
> > > some limitations.
> > >
> > > a) When multiple interpreter instances are running concurrently, they
> can
> > > not avoid interfere of their stdin/stdout/stderr.
> > > b) When interpreter's one dependency is designed(== hardcoded) to use
> > > Application classloader, it won't work within Zeppelin because Zeppelin
> > > loads interpreter's dependency jars in it's threadcontext classloader,
> > not
> > > Application classloader.
> > >
> > > Run interpreter in separate process is the solution i can think.
> > > In detail, because of interpreter is abstracted by it's public methods,
> > > everything will be simply done if we can call those method remotely by
> > some
> > > sort of RPC mechanism.
> > >
> > > Therefore
> > >
> > > a) Main entry point and run script to run interpreter in separate
> process
> > > b) RPC mechanism between Zeppelin and separate interpreter process
> > > c) Option to enabling/disabling this capability.
> > >
> > > are major tasks i'm thinking.
> > >
> > > What do you guys think? Please share if there're some idea.
> > >
> > > Best,
> > > moon
> >
>

Reply via email to