Re: Metron-265 Model as a Service

Casey Stella Thu, 07 Jul 2016 09:18:13 -0700

Yeah, I am slowly getting convinced that REST may be too much overhead and
tending closer to using Thrift and communicating to the model handler
(possibly in non-java) via some IPC.


On Thu, Jul 7, 2016 at 9:15 AM, Simon Ball <[email protected]> wrote:

> Hi Casey,
>
> Just to clarify, my thought was web sockets, not raw sockets, language
> agnostic, though thrift or proton if would be much better. Even with a non
> JSON payload, rest is very heavy over http. You be looking at probably
> 1-2kb header overhead per packet scored just on transport headers. Web
> socket frames carry slightly less overhead per message.
>
> Simon
>
>
> > On 7 Jul 2016, at 16:51, Casey Stella <[email protected]> wrote:
> >
> > Regarding the performance of REST:
> >
> > Yep, so everyone seems to be worried about the performance implications
> for
> > REST.  I made this comment on the JIRA, but I'll repeat it here for
> broader
> > discussion:
> >
> > My choice of REST was mostly due to the fact that I want to support
> >> multi-language (I think that's a very important requirement) and there
> are
> >> REST libraries for pretty much everything. I do agree, however, that
> JSON
> >> transport can get chunky. How about a compromise and use REST, but the
> >> input and output payloads for scoring are Maps encoded in msgpack rather
> >> than JSON. There is a msgpack library for pretty much every language out
> >> there (almost) and certainly all of the ones we'd like to target.
> >
> >
> >> The other option is to just create and expose protobuf bindings (thrift
> >> doesn't have a native client for R) for all of the languages that we
> want
> >> to support. I'm perfectly fine with that, but I had some worries about
> the
> >> maturity of the bindings.
> >
> >
> >> The final option, as you suggest, is to just use raw sockets. I think if
> >> we went that route, we might have to create a layer for each language
> >> rather than relying on model creators to create a TCP server. I thought
> >> that might be a bit onerous for a MVP.
> >
> >
> >> Given the discussion, though, what it has made me aware of is that we
> >> might not want to dictate a transport mechanism at all, but rather allow
> >> that to be pluggable and extensible (so each model would be associated
> with
> >> a transport mechanism handler that would know how to communicate to it.
> We
> >> would provide default mechanisms for msgpack over REST, JSON over REST
> and
> >> maybe msgpack over raw TCP.) Thoughts?
> >
> >
> > Regarding PMML:
> >
> > I tend to agree with James that PMML is too restrictive as to models it
> can
> > represent and I have not had great experiences with it in production.
> > Also, the open source libraries for PMML have licensing issues (jpmml
> > requires an older version to accommodate our licensing requirements).
> >
> > Regarding workflow:
> >
> > At the moment, I'd like to focus on getting a generalized infrastructure
> > for model scoring and updating put in place.   This means, this
> > architecture takes up the baton from the point when a model is
> > trained/created.  Also, I have attempted to be generic in terms of output
> > of the model (a map of results) so it can fit any type of model that I
> can
> > think of.  If that's not the case, let me know, though.
> >
> > For instance, for clustering, you would probably emit the cluster id
> > associated with the input and that would be added to the message as it
> > passes through the storm topology.  The model is responsible for
> processing
> > the input and constructing properly formed output.
> >
> > Casey
> >
> >
> > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <[email protected]>
> > wrote:
> >
> >> Following up on the thread a little late …. Awesome start Casey. Some
> >> comments:
> >> * Model execution
> >> ** I am guessing the model execution will be on YARN only for now. This
> is
> >> fine, but the REST call could have an overhead - depends on the speed.
> >> * PMML: won’t we have to choose some DSL for describing models?
> >> * Model:
> >> ** workflow vs a model -  do we care about the “workflow" that leads to
> >> the models or just the “model"? For example, we might start with n
> features
> >> —> do feature selection to choose k (or apply a transform function) —>
> >> apply a model etc
> >> * Use cases - I can see this working for n-ary classification style
> models
> >> easily. Will the same mechanism be used for stuff like clustering (or
> >> intermediate steps like feature selection alone).
> >>
> >> Thx
> >> debo
> >>
> >>
> >>
> >>
> >>> On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> wrote:
> >>>
> >>> Simon,
> >>>
> >>> There are several reasons to decouple model execution from Storm:
> >>>
> >>> - Reliability: It's much easier to handle a failed service than a
> failed
> >> bolt.  You can also troubleshoot without having to bring down the
> topology
> >>> - Complexity: you de-couple the model logic from Storm logic and can
> >> manage it independently of Storm
> >>> - Portability: you can swap the model guts (switch from Spark to Flink,
> >> etc) and as long as you maintain the interface you are good to go
> >>> - Consistency: since we want to expose our models the same way we
> expose
> >> threat intel then it makes sense to expose them as a service
> >>>
> >>> In our vision for Metron we want to make it easy to uptake and share
> >> models.  I think well-defined interfaces and programmatic ways of
> >> deployment, lifecycle management, and scoring via well-defined REST
> >> interfaces will make this task easier.  We can do a few things to
> >>>
> >>> With respect to PMML I personally had not had much luck with it in
> >> production.  I would prefer models as POJOs.
> >>>
> >>> Thanks,
> >>> James
> >>>
> >>> 04.07.2016, 16:07, "Simon Ball" <[email protected]>:
> >>>> Since the models' parameters and execution algorithm are likely to be
> >> small, why not have the model store push the model changes and scoring
> >> direct to the bolts and execute within storm. This negates the overhead
> of
> >> a rest call to the model server, and the need for discovery of the model
> >> server in zookeeper.
> >>>>
> >>>> Something like the way ranger policies are updated / cached in plugins
> >> would seem to make sense, so that we're distributing the model execution
> >> directly into the enrichment pipeline rather than collecting in a
> central
> >> service.
> >>>>
> >>>> This would work with simple models on single events, but may struggle
> >> with correlation based models. However, those could be handled in storm
> by
> >> pushing into a windowing trident topology or something of the sort, or
> even
> >> with a parallel spark streaming job using the same method of
> distributing
> >> models.
> >>>>
> >>>> The real challenge here would be stateful online models, which seem
> >> like a minority case which could be handled by a shared state store
> such as
> >> HBase.
> >>>>
> >>>> You still keep the ability to run different languages, and platforms,
> >> but wrap managing the parallelism in storm bolts rather than yarn
> >> containers.
> >>>>
> >>>> We could also consider basing the model protocol on a a common model
> >> language like pmml, thong that is likely to be highly limiting.
> >>>>
> >>>> Simon
> >>>>
> >>>>> On 4 Jul 2016, at 22:35, Casey Stella <[email protected]> wrote:
> >>>>>
> >>>>> This is great! I'll capture any requirements that anyone wants to
> >>>>> contribute and ensure that the proposed architecture accommodates
> >> them. I
> >>>>> think we should focus on a minimal set of requirements and an
> >> architecture
> >>>>> that does not preclude a larger set. I have found that the best
> >> driver of
> >>>>> requirements are installed users. :)
> >>>>>
> >>>>> For instance, I think a lot of questions about how often to update a
> >> model
> >>>>> and such should be represented in the architecture by the ability to
> >>>>> manually update a model, so as long as we have the ability to update,
> >>>>> people can choose when and where to do it (i.e. time based or some
> >> other
> >>>>> trigger). That being said, we don't want to cause too much effort for
> >> the
> >>>>> user if we can avoid it with features.
> >>>>>
> >>>>> In terms of the questions laid out, here are the constraints from the
> >>>>> proposed architecture as I see them. It'd be great to get a sense of
> >>>>> whether these constraints are too onerous or where they're not
> >> opinionated
> >>>>> enough :
> >>>>>
> >>>>>   - Model versioning and retention
> >>>>>   - We do have the ability to update models, but the training and
> >> decision
> >>>>>      of when to update the model is left up to the user. We may want
> >> to think
> >>>>>      deeply about when and where automated model updates can fit
> >>>>>      - Also, retention is currently manual. It might be an easier win
> >> to
> >>>>>      set up policies around when to sunset models (after newer
> >> versions are
> >>>>>      added, for instance).
> >>>>>   - Model access controls management
> >>>>>   - The architecture proposes no constraints around this. As it
> stands
> >>>>>      now, models are held in HDFS, so it would inherit the same
> >> security
> >>>>>      capabilities from that (user/group permissions + Ranger, etc)
> >>>>>   - Requirements around concept drift
> >>>>>   - I'd love to hear user requirements around how we could
> >> automatically
> >>>>>      address concept drift. The architecture as it's proposed let's
> >> the user
> >>>>>      decide when to update models.
> >>>>>   - Requirements around model output
> >>>>>   - The architecture as it stands just mandates a JSON map input and
> >> JSON
> >>>>>      map output, so it's up to the model what they want to pass back.
> >>>>>      - It's also up to the model to document its own output.
> >>>>>   - Any model audit and logging requirements
> >>>>>   - The architecture proposes no constraints around this. I'd love to
> >> see
> >>>>>      community guidance around this. As it stands, we just log using
> >> the same
> >>>>>      mechanism as any YARN application.
> >>>>>   - What model metrics need to be exposed
> >>>>>   - The architecture proposes no constraints around this. I'd love to
> >> see
> >>>>>      community guidance around this.
> >>>>>      - Requirements around failure modes
> >>>>>   - We briefly touch on this in the document, but it is probably not
> >>>>>      complete. Service endpoint failure will result in blacklisting
> >> from a
> >>>>>      storm bolt perspective and node failure should result in a new
> >> container
> >>>>>      being started by the Yarn application master. Beyond that, the
> >>>>>      architecture isn't explicit.
> >>>>>
> >>>>>> On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>> I left a comment on the JIRA. I think your design is promising. One
> >>>>>> other thing I would suggest is for us to crowd source requirements
> >> around
> >>>>>> model management. Specifically:
> >>>>>>
> >>>>>> Model versioning and retention
> >>>>>> Model access controls management
> >>>>>> Requirements around concept drift
> >>>>>> Requirements around model output
> >>>>>> Any model audit and logging requirements
> >>>>>> What model metrics need to be exposed
> >>>>>> Requirements around failure modes
> >>>>>>
> >>>>>> 03.07.2016, 14:00, "Casey Stella" <[email protected]>:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I think we are at the point where we should try to tackle Model as
> a
> >>>>>>> service for Metron. As such, I created a JIRA and proposed an
> >>>>>> architecture
> >>>>>>> for accomplishing this within Metron.
> >>>>>>>
> >>>>>>> My inclination is to be data science language/library agnostic and
> >> to
> >>>>>>> provide a general purpose REST infrastructure for managing and
> >> serving
> >>>>>>> models trained on historical data captured from Metron. The
> >> assumption is
> >>>>>>> that we are within the hadoop ecosystem, so:
> >>>>>>>
> >>>>>>>   - Models stored on HDFS
> >>>>>>>   - REST Model Services resource-managed via Yarn
> >>>>>>>   - REST Model Services discovered via Zookeeper.
> >>>>>>>
> >>>>>>> I would really appreciate community comment on the JIRA (
> >>>>>>> https://issues.apache.org/jira/browse/METRON-265). The proposed
> >>>>>>> architecture is attached as a document to that JIRA.
> >>>>>>>
> >>>>>>> I look forward to feedback!
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Casey
> >>>>>>
> >>>>>> -------------------
> >>>>>> Thank you,
> >>>>>>
> >>>>>> James Sirota
> >>>>>> PPMC- Apache Metron (Incubating)
> >>>>>> jsirota AT apache DOT org
> >>>
> >>> -------------------
> >>> Thank you,
> >>>
> >>> James Sirota
> >>> PPMC- Apache Metron (Incubating)
> >>> jsirota AT apache DOT org
> >>
>

Re: Metron-265 Model as a Service

Reply via email to