Re: Metron-265 Model as a Service

Andrew Psaltis Thu, 07 Jul 2016 09:24:59 -0700

OK that makes sense. So the doc attached to this JIRA[1] just speaks to the
Model serving. Is there a doc for the model service? And by making this a
separate service we are saying that for every  “MODEL_APPLY(model_name,
param_1, param_2, …, param_n)” we are potentially going to go across the
wire and have a model executed? That seems pretty expensive, no?


Thanks,
Andrew

[1] https://issues.apache.org/jira/browse/METRON-265

On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <[email protected]> wrote:

> The "REST" model service, which I place in quotes because there is some
> strong discussion about whether REST is a reasonable transport for this, is
> responsible for providing the model.  The scoring/model application happens
> in the model service and the results get transferred back to the storm bolt
> that calls it.
>
> Casey
>
> On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <[email protected]>
> wrote:
>
> > Trying to make sure I grok this thread and the word doc attached to the
> > JIRA. The word doc and JIRA speak to a Model Service Service and that the
> > REST service will be responsible for serving up models. However, part of
> > this conversation seems to suggest that the model execution will actually
> > occur at the REST service .. in particular this comment from James:
> >
> > "There are several reasons to decouple model execution from Storm:"
> >
> > If the model execution is decoupled from Storm then it appears that the
> > REST service will be executing the model, not just serving it up, is that
> > correct?
> >
> > Thanks,
> > Andrew
> >
> >
> >
> > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <[email protected]>
> wrote:
> >
> > > Regarding the performance of REST:
> > >
> > > Yep, so everyone seems to be worried about the performance implications
> > for
> > > REST.  I made this comment on the JIRA, but I'll repeat it here for
> > broader
> > > discussion:
> > >
> > > My choice of REST was mostly due to the fact that I want to support
> > > > multi-language (I think that's a very important requirement) and
> there
> > > are
> > > > REST libraries for pretty much everything. I do agree, however, that
> > JSON
> > > > transport can get chunky. How about a compromise and use REST, but
> the
> > > > input and output payloads for scoring are Maps encoded in msgpack
> > rather
> > > > than JSON. There is a msgpack library for pretty much every language
> > out
> > > > there (almost) and certainly all of the ones we'd like to target.
> > > >
> > >
> > >
> > > > The other option is to just create and expose protobuf bindings
> (thrift
> > > > doesn't have a native client for R) for all of the languages that we
> > want
> > > > to support. I'm perfectly fine with that, but I had some worries
> about
> > > the
> > > > maturity of the bindings.
> > > >
> > >
> > >
> > > > The final option, as you suggest, is to just use raw sockets. I think
> > if
> > > > we went that route, we might have to create a layer for each language
> > > > rather than relying on model creators to create a TCP server. I
> thought
> > > > that might be a bit onerous for a MVP.
> > > >
> > >
> > >
> > > > Given the discussion, though, what it has made me aware of is that we
> > > > might not want to dictate a transport mechanism at all, but rather
> > allow
> > > > that to be pluggable and extensible (so each model would be
> associated
> > > with
> > > > a transport mechanism handler that would know how to communicate to
> it.
> > > We
> > > > would provide default mechanisms for msgpack over REST, JSON over
> REST
> > > and
> > > > maybe msgpack over raw TCP.) Thoughts?
> > >
> > >
> > > Regarding PMML:
> > >
> > > I tend to agree with James that PMML is too restrictive as to models it
> > can
> > > represent and I have not had great experiences with it in production.
> > > Also, the open source libraries for PMML have licensing issues (jpmml
> > > requires an older version to accommodate our licensing requirements).
> > >
> > > Regarding workflow:
> > >
> > > At the moment, I'd like to focus on getting a generalized
> infrastructure
> > > for model scoring and updating put in place.   This means, this
> > > architecture takes up the baton from the point when a model is
> > > trained/created.  Also, I have attempted to be generic in terms of
> output
> > > of the model (a map of results) so it can fit any type of model that I
> > can
> > > think of.  If that's not the case, let me know, though.
> > >
> > > For instance, for clustering, you would probably emit the cluster id
> > > associated with the input and that would be added to the message as it
> > > passes through the storm topology.  The model is responsible for
> > processing
> > > the input and constructing properly formed output.
> > >
> > > Casey
> > >
> > >
> > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <
> [email protected]>
> > > wrote:
> > >
> > > > Following up on the thread a little late …. Awesome start Casey. Some
> > > > comments:
> > > > * Model execution
> > > > ** I am guessing the model execution will be on YARN only for now.
> This
> > > is
> > > > fine, but the REST call could have an overhead - depends on the
> speed.
> > > > * PMML: won’t we have to choose some DSL for describing models?
> > > > * Model:
> > > > ** workflow vs a model -  do we care about the “workflow" that leads
> to
> > > > the models or just the “model"? For example, we might start with n
> > > features
> > > > —> do feature selection to choose k (or apply a transform function)
> —>
> > > > apply a model etc
> > > > * Use cases - I can see this working for n-ary classification style
> > > models
> > > > easily. Will the same mechanism be used for stuff like clustering (or
> > > > intermediate steps like feature selection alone).
> > > >
> > > > Thx
> > > > debo
> > > >
> > > >
> > > >
> > > >
> > > > On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> wrote:
> > > >
> > > > >Simon,
> > > > >
> > > > >There are several reasons to decouple model execution from Storm:
> > > > >
> > > > >- Reliability: It's much easier to handle a failed service than a
> > failed
> > > > bolt.  You can also troubleshoot without having to bring down the
> > > topology
> > > > >- Complexity: you de-couple the model logic from Storm logic and can
> > > > manage it independently of Storm
> > > > >- Portability: you can swap the model guts (switch from Spark to
> > Flink,
> > > > etc) and as long as you maintain the interface you are good to go
> > > > >- Consistency: since we want to expose our models the same way we
> > expose
> > > > threat intel then it makes sense to expose them as a service
> > > > >
> > > > >In our vision for Metron we want to make it easy to uptake and share
> > > > models.  I think well-defined interfaces and programmatic ways of
> > > > deployment, lifecycle management, and scoring via well-defined REST
> > > > interfaces will make this task easier.  We can do a few things to
> > > > >
> > > > >With respect to PMML I personally had not had much luck with it in
> > > > production.  I would prefer models as POJOs.
> > > > >
> > > > >Thanks,
> > > > >James
> > > > >
> > > > >04.07.2016, 16:07, "Simon Ball" <[email protected]>:
> > > > >> Since the models' parameters and execution algorithm are likely to
> > be
> > > > small, why not have the model store push the model changes and
> scoring
> > > > direct to the bolts and execute within storm. This negates the
> overhead
> > > of
> > > > a rest call to the model server, and the need for discovery of the
> > model
> > > > server in zookeeper.
> > > > >>
> > > > >> Something like the way ranger policies are updated / cached in
> > plugins
> > > > would seem to make sense, so that we're distributing the model
> > execution
> > > > directly into the enrichment pipeline rather than collecting in a
> > central
> > > > service.
> > > > >>
> > > > >> This would work with simple models on single events, but may
> > struggle
> > > > with correlation based models. However, those could be handled in
> storm
> > > by
> > > > pushing into a windowing trident topology or something of the sort,
> or
> > > even
> > > > with a parallel spark streaming job using the same method of
> > distributing
> > > > models.
> > > > >>
> > > > >> The real challenge here would be stateful online models, which
> seem
> > > > like a minority case which could be handled by a shared state store
> > such
> > > as
> > > > HBase.
> > > > >>
> > > > >> You still keep the ability to run different languages, and
> > platforms,
> > > > but wrap managing the parallelism in storm bolts rather than yarn
> > > > containers.
> > > > >>
> > > > >> We could also consider basing the model protocol on a a common
> model
> > > > language like pmml, thong that is likely to be highly limiting.
> > > > >>
> > > > >> Simon
> > > > >>
> > > > >>>  On 4 Jul 2016, at 22:35, Casey Stella <[email protected]>
> wrote:
> > > > >>>
> > > > >>>  This is great! I'll capture any requirements that anyone wants
> to
> > > > >>>  contribute and ensure that the proposed architecture
> accommodates
> > > > them. I
> > > > >>>  think we should focus on a minimal set of requirements and an
> > > > architecture
> > > > >>>  that does not preclude a larger set. I have found that the best
> > > > driver of
> > > > >>>  requirements are installed users. :)
> > > > >>>
> > > > >>>  For instance, I think a lot of questions about how often to
> > update a
> > > > model
> > > > >>>  and such should be represented in the architecture by the
> ability
> > to
> > > > >>>  manually update a model, so as long as we have the ability to
> > > update,
> > > > >>>  people can choose when and where to do it (i.e. time based or
> some
> > > > other
> > > > >>>  trigger). That being said, we don't want to cause too much
> effort
> > > for
> > > > the
> > > > >>>  user if we can avoid it with features.
> > > > >>>
> > > > >>>  In terms of the questions laid out, here are the constraints
> from
> > > the
> > > > >>>  proposed architecture as I see them. It'd be great to get a
> sense
> > of
> > > > >>>  whether these constraints are too onerous or where they're not
> > > > opinionated
> > > > >>>  enough :
> > > > >>>
> > > > >>>    - Model versioning and retention
> > > > >>>    - We do have the ability to update models, but the training
> and
> > > > decision
> > > > >>>       of when to update the model is left up to the user. We may
> > want
> > > > to think
> > > > >>>       deeply about when and where automated model updates can fit
> > > > >>>       - Also, retention is currently manual. It might be an
> easier
> > > win
> > > > to
> > > > >>>       set up policies around when to sunset models (after newer
> > > > versions are
> > > > >>>       added, for instance).
> > > > >>>    - Model access controls management
> > > > >>>    - The architecture proposes no constraints around this. As it
> > > stands
> > > > >>>       now, models are held in HDFS, so it would inherit the same
> > > > security
> > > > >>>       capabilities from that (user/group permissions + Ranger,
> etc)
> > > > >>>    - Requirements around concept drift
> > > > >>>    - I'd love to hear user requirements around how we could
> > > > automatically
> > > > >>>       address concept drift. The architecture as it's proposed
> > let's
> > > > the user
> > > > >>>       decide when to update models.
> > > > >>>    - Requirements around model output
> > > > >>>    - The architecture as it stands just mandates a JSON map input
> > and
> > > > JSON
> > > > >>>       map output, so it's up to the model what they want to pass
> > > back.
> > > > >>>       - It's also up to the model to document its own output.
> > > > >>>    - Any model audit and logging requirements
> > > > >>>    - The architecture proposes no constraints around this. I'd
> love
> > > to
> > > > see
> > > > >>>       community guidance around this. As it stands, we just log
> > using
> > > > the same
> > > > >>>       mechanism as any YARN application.
> > > > >>>    - What model metrics need to be exposed
> > > > >>>    - The architecture proposes no constraints around this. I'd
> love
> > > to
> > > > see
> > > > >>>       community guidance around this.
> > > > >>>       - Requirements around failure modes
> > > > >>>    - We briefly touch on this in the document, but it is probably
> > not
> > > > >>>       complete. Service endpoint failure will result in
> > blacklisting
> > > > from a
> > > > >>>       storm bolt perspective and node failure should result in a
> > new
> > > > container
> > > > >>>       being started by the Yarn application master. Beyond that,
> > the
> > > > >>>       architecture isn't explicit.
> > > > >>>
> > > > >>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <
> [email protected]
> > >
> > > > wrote:
> > > > >>>>
> > > > >>>>  I left a comment on the JIRA. I think your design is promising.
> > One
> > > > >>>>  other thing I would suggest is for us to crowd source
> > requirements
> > > > around
> > > > >>>>  model management. Specifically:
> > > > >>>>
> > > > >>>>  Model versioning and retention
> > > > >>>>  Model access controls management
> > > > >>>>  Requirements around concept drift
> > > > >>>>  Requirements around model output
> > > > >>>>  Any model audit and logging requirements
> > > > >>>>  What model metrics need to be exposed
> > > > >>>>  Requirements around failure modes
> > > > >>>>
> > > > >>>>  03.07.2016, 14:00, "Casey Stella" <[email protected]>:
> > > > >>>>>  Hi all,
> > > > >>>>>
> > > > >>>>>  I think we are at the point where we should try to tackle
> Model
> > > as a
> > > > >>>>>  service for Metron. As such, I created a JIRA and proposed an
> > > > >>>>  architecture
> > > > >>>>>  for accomplishing this within Metron.
> > > > >>>>>
> > > > >>>>>  My inclination is to be data science language/library agnostic
> > and
> > > > to
> > > > >>>>>  provide a general purpose REST infrastructure for managing and
> > > > serving
> > > > >>>>>  models trained on historical data captured from Metron. The
> > > > assumption is
> > > > >>>>>  that we are within the hadoop ecosystem, so:
> > > > >>>>>
> > > > >>>>>    - Models stored on HDFS
> > > > >>>>>    - REST Model Services resource-managed via Yarn
> > > > >>>>>    - REST Model Services discovered via Zookeeper.
> > > > >>>>>
> > > > >>>>>  I would really appreciate community comment on the JIRA (
> > > > >>>>>  https://issues.apache.org/jira/browse/METRON-265). The
> proposed
> > > > >>>>>  architecture is attached as a document to that JIRA.
> > > > >>>>>
> > > > >>>>>  I look forward to feedback!
> > > > >>>>>
> > > > >>>>>  Best,
> > > > >>>>>
> > > > >>>>>  Casey
> > > > >>>>
> > > > >>>>  -------------------
> > > > >>>>  Thank you,
> > > > >>>>
> > > > >>>>  James Sirota
> > > > >>>>  PPMC- Apache Metron (Incubating)
> > > > >>>>  jsirota AT apache DOT org
> > > > >
> > > > >-------------------
> > > > >Thank you,
> > > > >
> > > > >James Sirota
> > > > >PPMC- Apache Metron (Incubating)
> > > > >jsirota AT apache DOT org
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Andrew
> >
> > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
> >
>



-- 
Thanks,
Andrew

Subscribe to my book: Streaming Data <http://manning.com/psaltis>
<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>

Re: Metron-265 Model as a Service

Reply via email to