Re: Metron-265 Model as a Service

Casey Stella Thu, 07 Jul 2016 09:20:39 -0700

The "REST" model service, which I place in quotes because there is some
strong discussion about whether REST is a reasonable transport for this, is
responsible for providing the model.  The scoring/model application happens
in the model service and the results get transferred back to the storm bolt
that calls it.


Casey

On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <[email protected]>
wrote:

> Trying to make sure I grok this thread and the word doc attached to the
> JIRA. The word doc and JIRA speak to a Model Service Service and that the
> REST service will be responsible for serving up models. However, part of
> this conversation seems to suggest that the model execution will actually
> occur at the REST service .. in particular this comment from James:
>
> "There are several reasons to decouple model execution from Storm:"
>
> If the model execution is decoupled from Storm then it appears that the
> REST service will be executing the model, not just serving it up, is that
> correct?
>
> Thanks,
> Andrew
>
>
>
> On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <[email protected]> wrote:
>
> > Regarding the performance of REST:
> >
> > Yep, so everyone seems to be worried about the performance implications
> for
> > REST.  I made this comment on the JIRA, but I'll repeat it here for
> broader
> > discussion:
> >
> > My choice of REST was mostly due to the fact that I want to support
> > > multi-language (I think that's a very important requirement) and there
> > are
> > > REST libraries for pretty much everything. I do agree, however, that
> JSON
> > > transport can get chunky. How about a compromise and use REST, but the
> > > input and output payloads for scoring are Maps encoded in msgpack
> rather
> > > than JSON. There is a msgpack library for pretty much every language
> out
> > > there (almost) and certainly all of the ones we'd like to target.
> > >
> >
> >
> > > The other option is to just create and expose protobuf bindings (thrift
> > > doesn't have a native client for R) for all of the languages that we
> want
> > > to support. I'm perfectly fine with that, but I had some worries about
> > the
> > > maturity of the bindings.
> > >
> >
> >
> > > The final option, as you suggest, is to just use raw sockets. I think
> if
> > > we went that route, we might have to create a layer for each language
> > > rather than relying on model creators to create a TCP server. I thought
> > > that might be a bit onerous for a MVP.
> > >
> >
> >
> > > Given the discussion, though, what it has made me aware of is that we
> > > might not want to dictate a transport mechanism at all, but rather
> allow
> > > that to be pluggable and extensible (so each model would be associated
> > with
> > > a transport mechanism handler that would know how to communicate to it.
> > We
> > > would provide default mechanisms for msgpack over REST, JSON over REST
> > and
> > > maybe msgpack over raw TCP.) Thoughts?
> >
> >
> > Regarding PMML:
> >
> > I tend to agree with James that PMML is too restrictive as to models it
> can
> > represent and I have not had great experiences with it in production.
> > Also, the open source libraries for PMML have licensing issues (jpmml
> > requires an older version to accommodate our licensing requirements).
> >
> > Regarding workflow:
> >
> > At the moment, I'd like to focus on getting a generalized infrastructure
> > for model scoring and updating put in place.   This means, this
> > architecture takes up the baton from the point when a model is
> > trained/created.  Also, I have attempted to be generic in terms of output
> > of the model (a map of results) so it can fit any type of model that I
> can
> > think of.  If that's not the case, let me know, though.
> >
> > For instance, for clustering, you would probably emit the cluster id
> > associated with the input and that would be added to the message as it
> > passes through the storm topology.  The model is responsible for
> processing
> > the input and constructing properly formed output.
> >
> > Casey
> >
> >
> > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <[email protected]>
> > wrote:
> >
> > > Following up on the thread a little late …. Awesome start Casey. Some
> > > comments:
> > > * Model execution
> > > ** I am guessing the model execution will be on YARN only for now. This
> > is
> > > fine, but the REST call could have an overhead - depends on the speed.
> > > * PMML: won’t we have to choose some DSL for describing models?
> > > * Model:
> > > ** workflow vs a model -  do we care about the “workflow" that leads to
> > > the models or just the “model"? For example, we might start with n
> > features
> > > —> do feature selection to choose k (or apply a transform function) —>
> > > apply a model etc
> > > * Use cases - I can see this working for n-ary classification style
> > models
> > > easily. Will the same mechanism be used for stuff like clustering (or
> > > intermediate steps like feature selection alone).
> > >
> > > Thx
> > > debo
> > >
> > >
> > >
> > >
> > > On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> wrote:
> > >
> > > >Simon,
> > > >
> > > >There are several reasons to decouple model execution from Storm:
> > > >
> > > >- Reliability: It's much easier to handle a failed service than a
> failed
> > > bolt.  You can also troubleshoot without having to bring down the
> > topology
> > > >- Complexity: you de-couple the model logic from Storm logic and can
> > > manage it independently of Storm
> > > >- Portability: you can swap the model guts (switch from Spark to
> Flink,
> > > etc) and as long as you maintain the interface you are good to go
> > > >- Consistency: since we want to expose our models the same way we
> expose
> > > threat intel then it makes sense to expose them as a service
> > > >
> > > >In our vision for Metron we want to make it easy to uptake and share
> > > models.  I think well-defined interfaces and programmatic ways of
> > > deployment, lifecycle management, and scoring via well-defined REST
> > > interfaces will make this task easier.  We can do a few things to
> > > >
> > > >With respect to PMML I personally had not had much luck with it in
> > > production.  I would prefer models as POJOs.
> > > >
> > > >Thanks,
> > > >James
> > > >
> > > >04.07.2016, 16:07, "Simon Ball" <[email protected]>:
> > > >> Since the models' parameters and execution algorithm are likely to
> be
> > > small, why not have the model store push the model changes and scoring
> > > direct to the bolts and execute within storm. This negates the overhead
> > of
> > > a rest call to the model server, and the need for discovery of the
> model
> > > server in zookeeper.
> > > >>
> > > >> Something like the way ranger policies are updated / cached in
> plugins
> > > would seem to make sense, so that we're distributing the model
> execution
> > > directly into the enrichment pipeline rather than collecting in a
> central
> > > service.
> > > >>
> > > >> This would work with simple models on single events, but may
> struggle
> > > with correlation based models. However, those could be handled in storm
> > by
> > > pushing into a windowing trident topology or something of the sort, or
> > even
> > > with a parallel spark streaming job using the same method of
> distributing
> > > models.
> > > >>
> > > >> The real challenge here would be stateful online models, which seem
> > > like a minority case which could be handled by a shared state store
> such
> > as
> > > HBase.
> > > >>
> > > >> You still keep the ability to run different languages, and
> platforms,
> > > but wrap managing the parallelism in storm bolts rather than yarn
> > > containers.
> > > >>
> > > >> We could also consider basing the model protocol on a a common model
> > > language like pmml, thong that is likely to be highly limiting.
> > > >>
> > > >> Simon
> > > >>
> > > >>>  On 4 Jul 2016, at 22:35, Casey Stella <[email protected]> wrote:
> > > >>>
> > > >>>  This is great! I'll capture any requirements that anyone wants to
> > > >>>  contribute and ensure that the proposed architecture accommodates
> > > them. I
> > > >>>  think we should focus on a minimal set of requirements and an
> > > architecture
> > > >>>  that does not preclude a larger set. I have found that the best
> > > driver of
> > > >>>  requirements are installed users. :)
> > > >>>
> > > >>>  For instance, I think a lot of questions about how often to
> update a
> > > model
> > > >>>  and such should be represented in the architecture by the ability
> to
> > > >>>  manually update a model, so as long as we have the ability to
> > update,
> > > >>>  people can choose when and where to do it (i.e. time based or some
> > > other
> > > >>>  trigger). That being said, we don't want to cause too much effort
> > for
> > > the
> > > >>>  user if we can avoid it with features.
> > > >>>
> > > >>>  In terms of the questions laid out, here are the constraints from
> > the
> > > >>>  proposed architecture as I see them. It'd be great to get a sense
> of
> > > >>>  whether these constraints are too onerous or where they're not
> > > opinionated
> > > >>>  enough :
> > > >>>
> > > >>>    - Model versioning and retention
> > > >>>    - We do have the ability to update models, but the training and
> > > decision
> > > >>>       of when to update the model is left up to the user. We may
> want
> > > to think
> > > >>>       deeply about when and where automated model updates can fit
> > > >>>       - Also, retention is currently manual. It might be an easier
> > win
> > > to
> > > >>>       set up policies around when to sunset models (after newer
> > > versions are
> > > >>>       added, for instance).
> > > >>>    - Model access controls management
> > > >>>    - The architecture proposes no constraints around this. As it
> > stands
> > > >>>       now, models are held in HDFS, so it would inherit the same
> > > security
> > > >>>       capabilities from that (user/group permissions + Ranger, etc)
> > > >>>    - Requirements around concept drift
> > > >>>    - I'd love to hear user requirements around how we could
> > > automatically
> > > >>>       address concept drift. The architecture as it's proposed
> let's
> > > the user
> > > >>>       decide when to update models.
> > > >>>    - Requirements around model output
> > > >>>    - The architecture as it stands just mandates a JSON map input
> and
> > > JSON
> > > >>>       map output, so it's up to the model what they want to pass
> > back.
> > > >>>       - It's also up to the model to document its own output.
> > > >>>    - Any model audit and logging requirements
> > > >>>    - The architecture proposes no constraints around this. I'd love
> > to
> > > see
> > > >>>       community guidance around this. As it stands, we just log
> using
> > > the same
> > > >>>       mechanism as any YARN application.
> > > >>>    - What model metrics need to be exposed
> > > >>>    - The architecture proposes no constraints around this. I'd love
> > to
> > > see
> > > >>>       community guidance around this.
> > > >>>       - Requirements around failure modes
> > > >>>    - We briefly touch on this in the document, but it is probably
> not
> > > >>>       complete. Service endpoint failure will result in
> blacklisting
> > > from a
> > > >>>       storm bolt perspective and node failure should result in a
> new
> > > container
> > > >>>       being started by the Yarn application master. Beyond that,
> the
> > > >>>       architecture isn't explicit.
> > > >>>
> > > >>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <[email protected]
> >
> > > wrote:
> > > >>>>
> > > >>>>  I left a comment on the JIRA. I think your design is promising.
> One
> > > >>>>  other thing I would suggest is for us to crowd source
> requirements
> > > around
> > > >>>>  model management. Specifically:
> > > >>>>
> > > >>>>  Model versioning and retention
> > > >>>>  Model access controls management
> > > >>>>  Requirements around concept drift
> > > >>>>  Requirements around model output
> > > >>>>  Any model audit and logging requirements
> > > >>>>  What model metrics need to be exposed
> > > >>>>  Requirements around failure modes
> > > >>>>
> > > >>>>  03.07.2016, 14:00, "Casey Stella" <[email protected]>:
> > > >>>>>  Hi all,
> > > >>>>>
> > > >>>>>  I think we are at the point where we should try to tackle Model
> > as a
> > > >>>>>  service for Metron. As such, I created a JIRA and proposed an
> > > >>>>  architecture
> > > >>>>>  for accomplishing this within Metron.
> > > >>>>>
> > > >>>>>  My inclination is to be data science language/library agnostic
> and
> > > to
> > > >>>>>  provide a general purpose REST infrastructure for managing and
> > > serving
> > > >>>>>  models trained on historical data captured from Metron. The
> > > assumption is
> > > >>>>>  that we are within the hadoop ecosystem, so:
> > > >>>>>
> > > >>>>>    - Models stored on HDFS
> > > >>>>>    - REST Model Services resource-managed via Yarn
> > > >>>>>    - REST Model Services discovered via Zookeeper.
> > > >>>>>
> > > >>>>>  I would really appreciate community comment on the JIRA (
> > > >>>>>  https://issues.apache.org/jira/browse/METRON-265). The proposed
> > > >>>>>  architecture is attached as a document to that JIRA.
> > > >>>>>
> > > >>>>>  I look forward to feedback!
> > > >>>>>
> > > >>>>>  Best,
> > > >>>>>
> > > >>>>>  Casey
> > > >>>>
> > > >>>>  -------------------
> > > >>>>  Thank you,
> > > >>>>
> > > >>>>  James Sirota
> > > >>>>  PPMC- Apache Metron (Incubating)
> > > >>>>  jsirota AT apache DOT org
> > > >
> > > >-------------------
> > > >Thank you,
> > > >
> > > >James Sirota
> > > >PPMC- Apache Metron (Incubating)
> > > >jsirota AT apache DOT org
> > >
> >
>
>
>
> --
> Thanks,
> Andrew
>
> Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
>

Re: Metron-265 Model as a Service

Reply via email to