Re: [DISCUSS] Notebook serving

moon soo Lee Fri, 26 Apr 2019 09:46:47 -0700

Hi,

Although https://github.com/apache/zeppelin/pull/3356/files implements the
basic functionality of the design,
few key features are not yet implemented and many things can be improved.
Also we can even expand the scope, like,


 - Scaling (or autoscaling) serving is not yet implemented. Need to design
how we want to scale (manually by chaning # of replica or using autoscaling
using horizontal autoscaler, or maybe both?)
 - Routing table generation is based on periodic polling of services. Can
be improved by using watch api in kubernetes.
 - Notebook serving design take care of Testing of notebook because we can
not think serving notebook in production without test. But what about Code
review? do we need? if yes, how do we want to handle this for notebooks?
 - We have Test task and Serving task. Do we also need Training for Machine
learning use case?
 - Serving task runs at least one ZeppelinServer and an Interpreter JVM
process. Will there be a way to reduce memory footprint? like a
using graalvm in the container. So hundreds and thousands of small models
can be deployed without much overhead.
 - Every component, TestTask, ServingTask, ContextStorage, MetricStorage
are pluggable. Good! And do we need additional implementation for them?
currently Kubernetes environment is default implementation for all of the
component, but how about integrate with other popular software frameworks?
like Kubeflow, TensroflowServing, etc?

I think there are a lot of interesting topics beyond the pullrequest I
made. So, hope this be part of GSoC.

Thanks,
moon

On Fri, Apr 26, 2019 at 1:50 AM Dragos Dublea <[email protected]>
wrote:

> Hello,
>
> Happy Day!
>
> It is great to follow the improvements on this topic in this Pull Request
> <https://github.com/apache/zeppelin/pull/3356/files>.
> Is this project not going to be the part of GSoC any further? Is there a
> scope for FE as a part of GSoC?
>
> Thanks,
>
>
>
> On Tue, 16 Apr 2019 at 23:06, moon soo Lee <[email protected]> wrote:
>
> > Hi,
> >
> > You're right. I joined the program as a mentor.
> > Thanks again for the interest to the project and to this topic.
> >
> > Thanks,
> > moon
> >
> > On Sat, Apr 13, 2019 at 8:50 AM Dragos Dublea <
> [email protected]>
> > wrote:
> >
> >> Hello,
> >>
> >> I will be very glad to take up any subtasks or participate in a
> >> discussion about this project if you get time for the discussion.
> >>
> >> As I will begin with my vacation soon, I am excited to work on this
> >> project with the Zeppelin community.
> >>
> >> Thank you
> >>
> >> On Wed, Apr 10, 2019, 4:03 PM Dragos Dublea <[email protected]
> >
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> Thank you so much for your reply. AFAIK, Only the student signup period
> >>> is over. Mentors can still join the program. They will have to receive
> an
> >>> invite from the organization admins. Here, in this case, Apache
> Software
> >>> Foundation org admins will have to send the invite link to enable your
> >>> signup.
> >>>
> >>> Thanks again
> >>>
> >>>
> >>>
> >>> On Wed, 10 Apr 2019 at 13:23, moon soo Lee <[email protected]> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Thanks for the interest in this topic. I realized mentor sign up
> period
> >>>> is
> >>>> finished.
> >>>> Let me see if there's a way to add myself as a GsoC mentor or any
> other
> >>>> alternative.
> >>>>
> >>>> Regards,
> >>>> moon
> >>>>
> >>>> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <
> >>>> [email protected]>
> >>>> wrote:
> >>>>
> >>>> > Hello,
> >>>> >
> >>>> > This is a very interesting topic. I did go through the design doc.
> >>>> Can you
> >>>> > please mentor me to implement this? I am very much interested in
> >>>> taking it
> >>>> > up.
> >>>> >
> >>>> > Thanks
> >>>> >
> >>>> > On 2019/03/26 21:31:16, moon soo Lee <[email protected]> wrote:
> >>>> > > Hi,>
> >>>> > >
> >>>> > > There're some challenges bringing a model inside notebook to a
> >>>> > production>
> >>>> >
> >>>> > > environment.>
> >>>> > > Many many organizations, the most common practice I see today is
> >>>> > something>
> >>>> > > like>
> >>>> > >
> >>>> > > 1. Data scientist develop a model in a data science notebook.>
> >>>> > > 2. SW engineer rewrites the model, to meet the production
> >>>> requirements.>
> >>>> > >
> >>>> > > In other words, data scientists do not have self-service
> >>>> capability. And>
> >>>> > > the organization is spending a lot of time for reimplementing
> model
> >>>> for>
> >>>> > > production.>
> >>>> > >
> >>>> > > I tried to identify the gaps between data science notebook and
> >>>> > production>
> >>>> >
> >>>> > > environment, and what can possibly address them. So models that
> >>>> created
> >>>> > by>
> >>>> > > data scientists in the notebook can go production with minimum
> >>>> efforts.>
> >>>> > >
> >>>> > > I made a proposal to solve this problem. Please review and
> comment.
> >>>> Any>
> >>>> > > ideas and feedbacks are welcome. You can make a modification if
> >>>> needed.>
> >>>> > >
> >>>> >
> >>>> >
> >>>>
> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
> >>>> > >
> >>>> > >
> >>>> > > This document is linked from>
> >>>> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
> >>>> > >
> >>>> > > Thanks,>
> >>>> > > moon>
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: [DISCUSS] Notebook serving

Reply via email to