Re: Metron-265 Model as a Service

Andrew Psaltis Thu, 07 Jul 2016 12:10:21 -0700

Thanks Casey that all makes better sense now. I agree that using Curator is
a better route then a call out to the yarn registry.


On Thu, Jul 7, 2016 at 3:00 PM, Casey Stella <[email protected]> wrote:

> >
> > Considering both the storm bolts and the model service will be deployed
> > on Yarn, could the bolts not
> > use the Yarn registry to identify which model service to connect to
> before
> > making a request?
>
>
> The bolts are definitely going to figure out which endpoints are serving
> which models, but that info will come from zookeeper and get pushed to the
> bolts on change, rather than have a separate request to the yarn registry.
>
> How do you scale the model service endpoints if they have a preference for
> > which model they serve?
>
>
> I'd say preference is a loose term.  We'll probably just use a weighted die
> and bias the choice toward local endpoints over a remote endpoints.  Let's
> all keep in mind here that there are real reasons why you might not have a
> model executed from the same node as a storm worker.  Take for instance a
> tensorflow model that *needs* GPUs, you might never run a storm worker on
> those nodes.  In that situation, the network hop will probably be dominated
> by the computation done in scoring and it's probably not cost effective to
> scale storm along with GPU nodes.
>
>
> > And each is a simple REST (or another more performant protocol) service
> > as the document describes?
>
>
> Yep.
>
>
> On Thu, Jul 7, 2016 at 11:14 AM, Andrew Psaltis <[email protected]>
> wrote:
>
> > Thanks Casey, that helps.
> >
> > RE: I am talking about model execution here.  The endpoints are
> distributed
> > across the cluster and the storm bolt chooses a service to use (with a
> bias
> > toward using one that is local to that bolt) and the request is made to
> the
> > endpoint, which scores the input and returns the response.
> >
> > This makes sense. Depending on volume and velocity of data seems like
> this
> > could get expensive.,
> >
> >
> > RE: Model service, if that term means what I think it means, is almost
> > entirely done inside of zookeeper.  For clarity, I'm talking about
> service
> > discovery (bolt discovers which endpoints serve which models) and model
> > updates
> >
> > Thanks this helps to clarify it quite a bit.  Considering both the storm
> > bolts and the model service will be deployed on Yarn, could the bolts not
> > use the Yarn registry to identify which model service to connect to
> before
> > making a request?
> >
> > How do you scale the model service endpoints if they have a preference
> for
> > which model they serve? And each is a simple REST (or another more
> > performant protocol) service as the document describes?
> >
> >
> >
> > Thanks,
> > Andrew
> >
> > On Thu, Jul 7, 2016 at 1:51 PM, Casey Stella <[email protected]> wrote:
> >
> > > Great questions Andrew.  Thanks for the interest. :)
> > >
> > > RE:: "which is why there would be a caching layer set in front of it at
> > the
> > > Storm bolt level"
> > >
> > > Right now we have a LRU caching layer in front of the HBase enrichment
> > > adapters, so it would work similarly.  You can imagine, the range of
> > inputs
> > > is likely not perfectly random, so it's reasonable for the cache to
> have
> > a
> > > non-empty working set.  Take for instance a DGA model; the input would
> > be a
> > > domain and most organizations will have an uneven distribution of
> domains
> > > they access with a heavy skew toward a small number.
> > >
> > > RE: In this scenario, you can at least scale out via load balancing
> (i.e.
> > > multiple model services serving the same model) since the models are
> > > immutable.
> > >
> > > I am talking about model execution here.  The endpoints are distributed
> > > across the cluster and the storm bolt chooses a service to use (with a
> > bias
> > > toward using one that is local to that bolt) and the request is made to
> > the
> > > endpoint, which scores the input and returns the response.
> > >
> > > Model service, if that term means what I think it means, is almost
> > entirely
> > > done inside of zookeeper.  For clarity, I'm talking about service
> > discovery
> > > (bolt discovers which endpoints serve which models) and model updates.
> > We
> > > are not sending the model around to any bolts or any such thing, just
> for
> > > clarity sake.
> > >
> > >
> > >
> > > On Thu, Jul 7, 2016 at 9:47 AM, Andrew Psaltis <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Thanks Casey! Couple of quick questions.
> > > >
> > > > RE:: "which is why there would be a caching layer set in front of it
> at
> > > the
> > > > Storm bolt level"
> > > > Hmm, would this be of the results of model execution? Would this
> really
> > > > work when each tuple may contain totally different data? Or is the
> > > caching
> > > > going to be smart enough that it will look at all the data passed in
> > and
> > > > determine that an identical tuple has already been evaluated so serve
> > the
> > > > result out of cache?
> > > >
> > > > RE: "Also, we would prefer local instances of the service when and
> > where
> > > > possible"
> > > > Perfect makes sense.
> > > >
> > > > RE: Serving many models from every storm bolt is also fairly
> expensive.
> > > > I can see how it could be, but couldn't  we can make sure that not
> all
> > > > models live in every bolt?
> > > >
> > > > RE: In this scenario, you can at least scale out via load balancing
> > (i.e.
> > > > multiple model services serving the same model) since the models are
> > > > immutable.
> > > > This seems to address the model serving, not model execution service.
> > > > Having yet one more layer to scale and mange also seems like it
> > > > would further complicate things. Could we not just also scale the
> > bolts?
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jul 7, 2016 at 12:37 PM, Casey Stella <[email protected]>
> > > wrote:
> > > >
> > > > > So, regarding the expense of communication; I tend to agree that it
> > is
> > > > > expensive, which is why there would be a caching layer set in front
> > of
> > > it
> > > > > at the Storm bolt level.  Also, we would prefer local instances of
> > the
> > > > > service when and where possible.  Serving many models from every
> > storm
> > > > bolt
> > > > > is also fairly expensive.  In this scenario, you can at least scale
> > out
> > > > via
> > > > > load balancing (i.e. multiple model services serving the same
> model)
> > > > since
> > > > > the models are immutable.
> > > > >
> > > > > On Thu, Jul 7, 2016 at 9:24 AM, Andrew Psaltis <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > OK that makes sense. So the doc attached to this JIRA[1] just
> > speaks
> > > to
> > > > > the
> > > > > > Model serving. Is there a doc for the model service? And by
> making
> > > > this a
> > > > > > separate service we are saying that for every
> > > “MODEL_APPLY(model_name,
> > > > > > param_1, param_2, …, param_n)” we are potentially going to go
> > across
> > > > the
> > > > > > wire and have a model executed? That seems pretty expensive, no?
> > > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/METRON-265
> > > > > >
> > > > > > On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <
> [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > The "REST" model service, which I place in quotes because there
> > is
> > > > some
> > > > > > > strong discussion about whether REST is a reasonable transport
> > for
> > > > > this,
> > > > > > is
> > > > > > > responsible for providing the model.  The scoring/model
> > application
> > > > > > happens
> > > > > > > in the model service and the results get transferred back to
> the
> > > > storm
> > > > > > bolt
> > > > > > > that calls it.
> > > > > > >
> > > > > > > Casey
> > > > > > >
> > > > > > > On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Trying to make sure I grok this thread and the word doc
> > attached
> > > to
> > > > > the
> > > > > > > > JIRA. The word doc and JIRA speak to a Model Service Service
> > and
> > > > that
> > > > > > the
> > > > > > > > REST service will be responsible for serving up models.
> > However,
> > > > part
> > > > > > of
> > > > > > > > this conversation seems to suggest that the model execution
> > will
> > > > > > actually
> > > > > > > > occur at the REST service .. in particular this comment from
> > > James:
> > > > > > > >
> > > > > > > > "There are several reasons to decouple model execution from
> > > Storm:"
> > > > > > > >
> > > > > > > > If the model execution is decoupled from Storm then it
> appears
> > > that
> > > > > the
> > > > > > > > REST service will be executing the model, not just serving it
> > up,
> > > > is
> > > > > > that
> > > > > > > > correct?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Regarding the performance of REST:
> > > > > > > > >
> > > > > > > > > Yep, so everyone seems to be worried about the performance
> > > > > > implications
> > > > > > > > for
> > > > > > > > > REST.  I made this comment on the JIRA, but I'll repeat it
> > here
> > > > for
> > > > > > > > broader
> > > > > > > > > discussion:
> > > > > > > > >
> > > > > > > > > My choice of REST was mostly due to the fact that I want to
> > > > support
> > > > > > > > > > multi-language (I think that's a very important
> > requirement)
> > > > and
> > > > > > > there
> > > > > > > > > are
> > > > > > > > > > REST libraries for pretty much everything. I do agree,
> > > however,
> > > > > > that
> > > > > > > > JSON
> > > > > > > > > > transport can get chunky. How about a compromise and use
> > > REST,
> > > > > but
> > > > > > > the
> > > > > > > > > > input and output payloads for scoring are Maps encoded in
> > > > msgpack
> > > > > > > > rather
> > > > > > > > > > than JSON. There is a msgpack library for pretty much
> every
> > > > > > language
> > > > > > > > out
> > > > > > > > > > there (almost) and certainly all of the ones we'd like to
> > > > target.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > The other option is to just create and expose protobuf
> > > bindings
> > > > > > > (thrift
> > > > > > > > > > doesn't have a native client for R) for all of the
> > languages
> > > > that
> > > > > > we
> > > > > > > > want
> > > > > > > > > > to support. I'm perfectly fine with that, but I had some
> > > > worries
> > > > > > > about
> > > > > > > > > the
> > > > > > > > > > maturity of the bindings.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > The final option, as you suggest, is to just use raw
> > > sockets. I
> > > > > > think
> > > > > > > > if
> > > > > > > > > > we went that route, we might have to create a layer for
> > each
> > > > > > language
> > > > > > > > > > rather than relying on model creators to create a TCP
> > > server. I
> > > > > > > thought
> > > > > > > > > > that might be a bit onerous for a MVP.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Given the discussion, though, what it has made me aware
> of
> > is
> > > > > that
> > > > > > we
> > > > > > > > > > might not want to dictate a transport mechanism at all,
> but
> > > > > rather
> > > > > > > > allow
> > > > > > > > > > that to be pluggable and extensible (so each model would
> be
> > > > > > > associated
> > > > > > > > > with
> > > > > > > > > > a transport mechanism handler that would know how to
> > > > communicate
> > > > > to
> > > > > > > it.
> > > > > > > > > We
> > > > > > > > > > would provide default mechanisms for msgpack over REST,
> > JSON
> > > > over
> > > > > > > REST
> > > > > > > > > and
> > > > > > > > > > maybe msgpack over raw TCP.) Thoughts?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Regarding PMML:
> > > > > > > > >
> > > > > > > > > I tend to agree with James that PMML is too restrictive as
> to
> > > > > models
> > > > > > it
> > > > > > > > can
> > > > > > > > > represent and I have not had great experiences with it in
> > > > > production.
> > > > > > > > > Also, the open source libraries for PMML have licensing
> > issues
> > > > > (jpmml
> > > > > > > > > requires an older version to accommodate our licensing
> > > > > requirements).
> > > > > > > > >
> > > > > > > > > Regarding workflow:
> > > > > > > > >
> > > > > > > > > At the moment, I'd like to focus on getting a generalized
> > > > > > > infrastructure
> > > > > > > > > for model scoring and updating put in place.   This means,
> > this
> > > > > > > > > architecture takes up the baton from the point when a model
> > is
> > > > > > > > > trained/created.  Also, I have attempted to be generic in
> > terms
> > > > of
> > > > > > > output
> > > > > > > > > of the model (a map of results) so it can fit any type of
> > model
> > > > > that
> > > > > > I
> > > > > > > > can
> > > > > > > > > think of.  If that's not the case, let me know, though.
> > > > > > > > >
> > > > > > > > > For instance, for clustering, you would probably emit the
> > > cluster
> > > > > id
> > > > > > > > > associated with the input and that would be added to the
> > > message
> > > > as
> > > > > > it
> > > > > > > > > passes through the storm topology.  The model is
> responsible
> > > for
> > > > > > > > processing
> > > > > > > > > the input and constructing properly formed output.
> > > > > > > > >
> > > > > > > > > Casey
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Following up on the thread a little late …. Awesome start
> > > > Casey.
> > > > > > Some
> > > > > > > > > > comments:
> > > > > > > > > > * Model execution
> > > > > > > > > > ** I am guessing the model execution will be on YARN only
> > for
> > > > > now.
> > > > > > > This
> > > > > > > > > is
> > > > > > > > > > fine, but the REST call could have an overhead - depends
> on
> > > the
> > > > > > > speed.
> > > > > > > > > > * PMML: won’t we have to choose some DSL for describing
> > > models?
> > > > > > > > > > * Model:
> > > > > > > > > > ** workflow vs a model -  do we care about the “workflow"
> > > that
> > > > > > leads
> > > > > > > to
> > > > > > > > > > the models or just the “model"? For example, we might
> start
> > > > with
> > > > > n
> > > > > > > > > features
> > > > > > > > > > —> do feature selection to choose k (or apply a transform
> > > > > function)
> > > > > > > —>
> > > > > > > > > > apply a model etc
> > > > > > > > > > * Use cases - I can see this working for n-ary
> > classification
> > > > > style
> > > > > > > > > models
> > > > > > > > > > easily. Will the same mechanism be used for stuff like
> > > > clustering
> > > > > > (or
> > > > > > > > > > intermediate steps like feature selection alone).
> > > > > > > > > >
> > > > > > > > > > Thx
> > > > > > > > > > debo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 7/5/16, 3:24 PM, "James Sirota" <[email protected]>
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > >Simon,
> > > > > > > > > > >
> > > > > > > > > > >There are several reasons to decouple model execution
> from
> > > > > Storm:
> > > > > > > > > > >
> > > > > > > > > > >- Reliability: It's much easier to handle a failed
> service
> > > > than
> > > > > a
> > > > > > > > failed
> > > > > > > > > > bolt.  You can also troubleshoot without having to bring
> > down
> > > > the
> > > > > > > > > topology
> > > > > > > > > > >- Complexity: you de-couple the model logic from Storm
> > logic
> > > > and
> > > > > > can
> > > > > > > > > > manage it independently of Storm
> > > > > > > > > > >- Portability: you can swap the model guts (switch from
> > > Spark
> > > > to
> > > > > > > > Flink,
> > > > > > > > > > etc) and as long as you maintain the interface you are
> good
> > > to
> > > > go
> > > > > > > > > > >- Consistency: since we want to expose our models the
> same
> > > way
> > > > > we
> > > > > > > > expose
> > > > > > > > > > threat intel then it makes sense to expose them as a
> > service
> > > > > > > > > > >
> > > > > > > > > > >In our vision for Metron we want to make it easy to
> uptake
> > > and
> > > > > > share
> > > > > > > > > > models.  I think well-defined interfaces and programmatic
> > > ways
> > > > of
> > > > > > > > > > deployment, lifecycle management, and scoring via
> > > well-defined
> > > > > REST
> > > > > > > > > > interfaces will make this task easier.  We can do a few
> > > things
> > > > to
> > > > > > > > > > >
> > > > > > > > > > >With respect to PMML I personally had not had much luck
> > with
> > > > it
> > > > > in
> > > > > > > > > > production.  I would prefer models as POJOs.
> > > > > > > > > > >
> > > > > > > > > > >Thanks,
> > > > > > > > > > >James
> > > > > > > > > > >
> > > > > > > > > > >04.07.2016, 16:07, "Simon Ball" <[email protected]
> >:
> > > > > > > > > > >> Since the models' parameters and execution algorithm
> are
> > > > > likely
> > > > > > to
> > > > > > > > be
> > > > > > > > > > small, why not have the model store push the model
> changes
> > > and
> > > > > > > scoring
> > > > > > > > > > direct to the bolts and execute within storm. This
> negates
> > > the
> > > > > > > overhead
> > > > > > > > > of
> > > > > > > > > > a rest call to the model server, and the need for
> discovery
> > > of
> > > > > the
> > > > > > > > model
> > > > > > > > > > server in zookeeper.
> > > > > > > > > > >>
> > > > > > > > > > >> Something like the way ranger policies are updated /
> > > cached
> > > > in
> > > > > > > > plugins
> > > > > > > > > > would seem to make sense, so that we're distributing the
> > > model
> > > > > > > > execution
> > > > > > > > > > directly into the enrichment pipeline rather than
> > collecting
> > > > in a
> > > > > > > > central
> > > > > > > > > > service.
> > > > > > > > > > >>
> > > > > > > > > > >> This would work with simple models on single events,
> but
> > > may
> > > > > > > > struggle
> > > > > > > > > > with correlation based models. However, those could be
> > > handled
> > > > in
> > > > > > > storm
> > > > > > > > > by
> > > > > > > > > > pushing into a windowing trident topology or something of
> > the
> > > > > sort,
> > > > > > > or
> > > > > > > > > even
> > > > > > > > > > with a parallel spark streaming job using the same method
> > of
> > > > > > > > distributing
> > > > > > > > > > models.
> > > > > > > > > > >>
> > > > > > > > > > >> The real challenge here would be stateful online
> models,
> > > > which
> > > > > > > seem
> > > > > > > > > > like a minority case which could be handled by a shared
> > state
> > > > > store
> > > > > > > > such
> > > > > > > > > as
> > > > > > > > > > HBase.
> > > > > > > > > > >>
> > > > > > > > > > >> You still keep the ability to run different languages,
> > and
> > > > > > > > platforms,
> > > > > > > > > > but wrap managing the parallelism in storm bolts rather
> > than
> > > > yarn
> > > > > > > > > > containers.
> > > > > > > > > > >>
> > > > > > > > > > >> We could also consider basing the model protocol on a
> a
> > > > common
> > > > > > > model
> > > > > > > > > > language like pmml, thong that is likely to be highly
> > > limiting.
> > > > > > > > > > >>
> > > > > > > > > > >> Simon
> > > > > > > > > > >>
> > > > > > > > > > >>>  On 4 Jul 2016, at 22:35, Casey Stella <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > > >>>  This is great! I'll capture any requirements that
> > anyone
> > > > > wants
> > > > > > > to
> > > > > > > > > > >>>  contribute and ensure that the proposed architecture
> > > > > > > accommodates
> > > > > > > > > > them. I
> > > > > > > > > > >>>  think we should focus on a minimal set of
> requirements
> > > and
> > > > > an
> > > > > > > > > > architecture
> > > > > > > > > > >>>  that does not preclude a larger set. I have found
> that
> > > the
> > > > > > best
> > > > > > > > > > driver of
> > > > > > > > > > >>>  requirements are installed users. :)
> > > > > > > > > > >>>
> > > > > > > > > > >>>  For instance, I think a lot of questions about how
> > often
> > > > to
> > > > > > > > update a
> > > > > > > > > > model
> > > > > > > > > > >>>  and such should be represented in the architecture
> by
> > > the
> > > > > > > ability
> > > > > > > > to
> > > > > > > > > > >>>  manually update a model, so as long as we have the
> > > ability
> > > > > to
> > > > > > > > > update,
> > > > > > > > > > >>>  people can choose when and where to do it (i.e. time
> > > based
> > > > > or
> > > > > > > some
> > > > > > > > > > other
> > > > > > > > > > >>>  trigger). That being said, we don't want to cause
> too
> > > much
> > > > > > > effort
> > > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > >>>  user if we can avoid it with features.
> > > > > > > > > > >>>
> > > > > > > > > > >>>  In terms of the questions laid out, here are the
> > > > constraints
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > >>>  proposed architecture as I see them. It'd be great
> to
> > > get
> > > > a
> > > > > > > sense
> > > > > > > > of
> > > > > > > > > > >>>  whether these constraints are too onerous or where
> > > they're
> > > > > not
> > > > > > > > > > opinionated
> > > > > > > > > > >>>  enough :
> > > > > > > > > > >>>
> > > > > > > > > > >>>    - Model versioning and retention
> > > > > > > > > > >>>    - We do have the ability to update models, but the
> > > > > training
> > > > > > > and
> > > > > > > > > > decision
> > > > > > > > > > >>>       of when to update the model is left up to the
> > user.
> > > > We
> > > > > > may
> > > > > > > > want
> > > > > > > > > > to think
> > > > > > > > > > >>>       deeply about when and where automated model
> > updates
> > > > can
> > > > > > fit
> > > > > > > > > > >>>       - Also, retention is currently manual. It might
> > be
> > > an
> > > > > > > easier
> > > > > > > > > win
> > > > > > > > > > to
> > > > > > > > > > >>>       set up policies around when to sunset models
> > (after
> > > > > newer
> > > > > > > > > > versions are
> > > > > > > > > > >>>       added, for instance).
> > > > > > > > > > >>>    - Model access controls management
> > > > > > > > > > >>>    - The architecture proposes no constraints around
> > > this.
> > > > As
> > > > > > it
> > > > > > > > > stands
> > > > > > > > > > >>>       now, models are held in HDFS, so it would
> inherit
> > > the
> > > > > > same
> > > > > > > > > > security
> > > > > > > > > > >>>       capabilities from that (user/group permissions
> +
> > > > > Ranger,
> > > > > > > etc)
> > > > > > > > > > >>>    - Requirements around concept drift
> > > > > > > > > > >>>    - I'd love to hear user requirements around how we
> > > could
> > > > > > > > > > automatically
> > > > > > > > > > >>>       address concept drift. The architecture as it's
> > > > > proposed
> > > > > > > > let's
> > > > > > > > > > the user
> > > > > > > > > > >>>       decide when to update models.
> > > > > > > > > > >>>    - Requirements around model output
> > > > > > > > > > >>>    - The architecture as it stands just mandates a
> JSON
> > > map
> > > > > > input
> > > > > > > > and
> > > > > > > > > > JSON
> > > > > > > > > > >>>       map output, so it's up to the model what they
> > want
> > > to
> > > > > > pass
> > > > > > > > > back.
> > > > > > > > > > >>>       - It's also up to the model to document its own
> > > > output.
> > > > > > > > > > >>>    - Any model audit and logging requirements
> > > > > > > > > > >>>    - The architecture proposes no constraints around
> > > this.
> > > > > I'd
> > > > > > > love
> > > > > > > > > to
> > > > > > > > > > see
> > > > > > > > > > >>>       community guidance around this. As it stands,
> we
> > > just
> > > > > log
> > > > > > > > using
> > > > > > > > > > the same
> > > > > > > > > > >>>       mechanism as any YARN application.
> > > > > > > > > > >>>    - What model metrics need to be exposed
> > > > > > > > > > >>>    - The architecture proposes no constraints around
> > > this.
> > > > > I'd
> > > > > > > love
> > > > > > > > > to
> > > > > > > > > > see
> > > > > > > > > > >>>       community guidance around this.
> > > > > > > > > > >>>       - Requirements around failure modes
> > > > > > > > > > >>>    - We briefly touch on this in the document, but it
> > is
> > > > > > probably
> > > > > > > > not
> > > > > > > > > > >>>       complete. Service endpoint failure will result
> in
> > > > > > > > blacklisting
> > > > > > > > > > from a
> > > > > > > > > > >>>       storm bolt perspective and node failure should
> > > result
> > > > > in
> > > > > > a
> > > > > > > > new
> > > > > > > > > > container
> > > > > > > > > > >>>       being started by the Yarn application master.
> > > Beyond
> > > > > > that,
> > > > > > > > the
> > > > > > > > > > >>>       architecture isn't explicit.
> > > > > > > > > > >>>
> > > > > > > > > > >>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>  I left a comment on the JIRA. I think your design
> is
> > > > > > promising.
> > > > > > > > One
> > > > > > > > > > >>>>  other thing I would suggest is for us to crowd
> source
> > > > > > > > requirements
> > > > > > > > > > around
> > > > > > > > > > >>>>  model management. Specifically:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>  Model versioning and retention
> > > > > > > > > > >>>>  Model access controls management
> > > > > > > > > > >>>>  Requirements around concept drift
> > > > > > > > > > >>>>  Requirements around model output
> > > > > > > > > > >>>>  Any model audit and logging requirements
> > > > > > > > > > >>>>  What model metrics need to be exposed
> > > > > > > > > > >>>>  Requirements around failure modes
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>  03.07.2016, 14:00, "Casey Stella" <
> > [email protected]
> > > >:
> > > > > > > > > > >>>>>  Hi all,
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  I think we are at the point where we should try to
> > > > tackle
> > > > > > > Model
> > > > > > > > > as a
> > > > > > > > > > >>>>>  service for Metron. As such, I created a JIRA and
> > > > proposed
> > > > > > an
> > > > > > > > > > >>>>  architecture
> > > > > > > > > > >>>>>  for accomplishing this within Metron.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  My inclination is to be data science
> > language/library
> > > > > > agnostic
> > > > > > > > and
> > > > > > > > > > to
> > > > > > > > > > >>>>>  provide a general purpose REST infrastructure for
> > > > managing
> > > > > > and
> > > > > > > > > > serving
> > > > > > > > > > >>>>>  models trained on historical data captured from
> > > Metron.
> > > > > The
> > > > > > > > > > assumption is
> > > > > > > > > > >>>>>  that we are within the hadoop ecosystem, so:
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>    - Models stored on HDFS
> > > > > > > > > > >>>>>    - REST Model Services resource-managed via Yarn
> > > > > > > > > > >>>>>    - REST Model Services discovered via Zookeeper.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  I would really appreciate community comment on the
> > > JIRA
> > > > (
> > > > > > > > > > >>>>>  https://issues.apache.org/jira/browse/METRON-265
> ).
> > > The
> > > > > > > proposed
> > > > > > > > > > >>>>>  architecture is attached as a document to that
> JIRA.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  I look forward to feedback!
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  Best,
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>  Casey
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>  -------------------
> > > > > > > > > > >>>>  Thank you,
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>  James Sirota
> > > > > > > > > > >>>>  PPMC- Apache Metron (Incubating)
> > > > > > > > > > >>>>  jsirota AT apache DOT org
> > > > > > > > > > >
> > > > > > > > > > >-------------------
> > > > > > > > > > >Thank you,
> > > > > > > > > > >
> > > > > > > > > > >James Sirota
> > > > > > > > > > >PPMC- Apache Metron (Incubating)
> > > > > > > > > > >jsirota AT apache DOT org
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Thanks,
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > > Subscribe to my book: Streaming Data <
> > http://manning.com/psaltis
> > > >
> > > > > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > > > > > twiiter: @itmdata <
> > > > > http://twitter.com/intent/user?screen_name=itmdata>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Andrew
> > > > > >
> > > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis
> >
> > > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > > > twiiter: @itmdata <
> > > http://twitter.com/intent/user?screen_name=itmdata>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Andrew
> > > >
> > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > twiiter: @itmdata <
> http://twitter.com/intent/user?screen_name=itmdata>
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Andrew
> >
> > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
> >
>



-- 
Thanks,
Andrew

Subscribe to my book: Streaming Data <http://manning.com/psaltis>
<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>

Re: Metron-265 Model as a Service

Reply via email to