OK that makes sense. So the doc attached to this JIRA[1] just speaks to the Model serving. Is there a doc for the model service? And by making this a separate service we are saying that for every “MODEL_APPLY(model_name, param_1, param_2, …, param_n)” we are potentially going to go across the wire and have a model executed? That seems pretty expensive, no?
Thanks, Andrew [1] https://issues.apache.org/jira/browse/METRON-265 On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <[email protected]> wrote: > The "REST" model service, which I place in quotes because there is some > strong discussion about whether REST is a reasonable transport for this, is > responsible for providing the model. The scoring/model application happens > in the model service and the results get transferred back to the storm bolt > that calls it. > > Casey > > On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <[email protected]> > wrote: > > > Trying to make sure I grok this thread and the word doc attached to the > > JIRA. The word doc and JIRA speak to a Model Service Service and that the > > REST service will be responsible for serving up models. However, part of > > this conversation seems to suggest that the model execution will actually > > occur at the REST service .. in particular this comment from James: > > > > "There are several reasons to decouple model execution from Storm:" > > > > If the model execution is decoupled from Storm then it appears that the > > REST service will be executing the model, not just serving it up, is that > > correct? > > > > Thanks, > > Andrew > > > > > > > > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <[email protected]> > wrote: > > > > > Regarding the performance of REST: > > > > > > Yep, so everyone seems to be worried about the performance implications > > for > > > REST. I made this comment on the JIRA, but I'll repeat it here for > > broader > > > discussion: > > > > > > My choice of REST was mostly due to the fact that I want to support > > > > multi-language (I think that's a very important requirement) and > there > > > are > > > > REST libraries for pretty much everything. I do agree, however, that > > JSON > > > > transport can get chunky. How about a compromise and use REST, but > the > > > > input and output payloads for scoring are Maps encoded in msgpack > > rather > > > > than JSON. There is a msgpack library for pretty much every language > > out > > > > there (almost) and certainly all of the ones we'd like to target. > > > > > > > > > > > > > > The other option is to just create and expose protobuf bindings > (thrift > > > > doesn't have a native client for R) for all of the languages that we > > want > > > > to support. I'm perfectly fine with that, but I had some worries > about > > > the > > > > maturity of the bindings. > > > > > > > > > > > > > > The final option, as you suggest, is to just use raw sockets. I think > > if > > > > we went that route, we might have to create a layer for each language > > > > rather than relying on model creators to create a TCP server. I > thought > > > > that might be a bit onerous for a MVP. > > > > > > > > > > > > > > Given the discussion, though, what it has made me aware of is that we > > > > might not want to dictate a transport mechanism at all, but rather > > allow > > > > that to be pluggable and extensible (so each model would be > associated > > > with > > > > a transport mechanism handler that would know how to communicate to > it. > > > We > > > > would provide default mechanisms for msgpack over REST, JSON over > REST > > > and > > > > maybe msgpack over raw TCP.) Thoughts? > > > > > > > > > Regarding PMML: > > > > > > I tend to agree with James that PMML is too restrictive as to models it > > can > > > represent and I have not had great experiences with it in production. > > > Also, the open source libraries for PMML have licensing issues (jpmml > > > requires an older version to accommodate our licensing requirements). > > > > > > Regarding workflow: > > > > > > At the moment, I'd like to focus on getting a generalized > infrastructure > > > for model scoring and updating put in place. This means, this > > > architecture takes up the baton from the point when a model is > > > trained/created. Also, I have attempted to be generic in terms of > output > > > of the model (a map of results) so it can fit any type of model that I > > can > > > think of. If that's not the case, let me know, though. > > > > > > For instance, for clustering, you would probably emit the cluster id > > > associated with the input and that would be added to the message as it > > > passes through the storm topology. The model is responsible for > > processing > > > the input and constructing properly formed output. > > > > > > Casey > > > > > > > > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) < > [email protected]> > > > wrote: > > > > > > > Following up on the thread a little late …. Awesome start Casey. Some > > > > comments: > > > > * Model execution > > > > ** I am guessing the model execution will be on YARN only for now. > This > > > is > > > > fine, but the REST call could have an overhead - depends on the > speed. > > > > * PMML: won’t we have to choose some DSL for describing models? > > > > * Model: > > > > ** workflow vs a model - do we care about the “workflow" that leads > to > > > > the models or just the “model"? For example, we might start with n > > > features > > > > —> do feature selection to choose k (or apply a transform function) > —> > > > > apply a model etc > > > > * Use cases - I can see this working for n-ary classification style > > > models > > > > easily. Will the same mechanism be used for stuff like clustering (or > > > > intermediate steps like feature selection alone). > > > > > > > > Thx > > > > debo > > > > > > > > > > > > > > > > > > > > On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> wrote: > > > > > > > > >Simon, > > > > > > > > > >There are several reasons to decouple model execution from Storm: > > > > > > > > > >- Reliability: It's much easier to handle a failed service than a > > failed > > > > bolt. You can also troubleshoot without having to bring down the > > > topology > > > > >- Complexity: you de-couple the model logic from Storm logic and can > > > > manage it independently of Storm > > > > >- Portability: you can swap the model guts (switch from Spark to > > Flink, > > > > etc) and as long as you maintain the interface you are good to go > > > > >- Consistency: since we want to expose our models the same way we > > expose > > > > threat intel then it makes sense to expose them as a service > > > > > > > > > >In our vision for Metron we want to make it easy to uptake and share > > > > models. I think well-defined interfaces and programmatic ways of > > > > deployment, lifecycle management, and scoring via well-defined REST > > > > interfaces will make this task easier. We can do a few things to > > > > > > > > > >With respect to PMML I personally had not had much luck with it in > > > > production. I would prefer models as POJOs. > > > > > > > > > >Thanks, > > > > >James > > > > > > > > > >04.07.2016, 16:07, "Simon Ball" <[email protected]>: > > > > >> Since the models' parameters and execution algorithm are likely to > > be > > > > small, why not have the model store push the model changes and > scoring > > > > direct to the bolts and execute within storm. This negates the > overhead > > > of > > > > a rest call to the model server, and the need for discovery of the > > model > > > > server in zookeeper. > > > > >> > > > > >> Something like the way ranger policies are updated / cached in > > plugins > > > > would seem to make sense, so that we're distributing the model > > execution > > > > directly into the enrichment pipeline rather than collecting in a > > central > > > > service. > > > > >> > > > > >> This would work with simple models on single events, but may > > struggle > > > > with correlation based models. However, those could be handled in > storm > > > by > > > > pushing into a windowing trident topology or something of the sort, > or > > > even > > > > with a parallel spark streaming job using the same method of > > distributing > > > > models. > > > > >> > > > > >> The real challenge here would be stateful online models, which > seem > > > > like a minority case which could be handled by a shared state store > > such > > > as > > > > HBase. > > > > >> > > > > >> You still keep the ability to run different languages, and > > platforms, > > > > but wrap managing the parallelism in storm bolts rather than yarn > > > > containers. > > > > >> > > > > >> We could also consider basing the model protocol on a a common > model > > > > language like pmml, thong that is likely to be highly limiting. > > > > >> > > > > >> Simon > > > > >> > > > > >>> On 4 Jul 2016, at 22:35, Casey Stella <[email protected]> > wrote: > > > > >>> > > > > >>> This is great! I'll capture any requirements that anyone wants > to > > > > >>> contribute and ensure that the proposed architecture > accommodates > > > > them. I > > > > >>> think we should focus on a minimal set of requirements and an > > > > architecture > > > > >>> that does not preclude a larger set. I have found that the best > > > > driver of > > > > >>> requirements are installed users. :) > > > > >>> > > > > >>> For instance, I think a lot of questions about how often to > > update a > > > > model > > > > >>> and such should be represented in the architecture by the > ability > > to > > > > >>> manually update a model, so as long as we have the ability to > > > update, > > > > >>> people can choose when and where to do it (i.e. time based or > some > > > > other > > > > >>> trigger). That being said, we don't want to cause too much > effort > > > for > > > > the > > > > >>> user if we can avoid it with features. > > > > >>> > > > > >>> In terms of the questions laid out, here are the constraints > from > > > the > > > > >>> proposed architecture as I see them. It'd be great to get a > sense > > of > > > > >>> whether these constraints are too onerous or where they're not > > > > opinionated > > > > >>> enough : > > > > >>> > > > > >>> - Model versioning and retention > > > > >>> - We do have the ability to update models, but the training > and > > > > decision > > > > >>> of when to update the model is left up to the user. We may > > want > > > > to think > > > > >>> deeply about when and where automated model updates can fit > > > > >>> - Also, retention is currently manual. It might be an > easier > > > win > > > > to > > > > >>> set up policies around when to sunset models (after newer > > > > versions are > > > > >>> added, for instance). > > > > >>> - Model access controls management > > > > >>> - The architecture proposes no constraints around this. As it > > > stands > > > > >>> now, models are held in HDFS, so it would inherit the same > > > > security > > > > >>> capabilities from that (user/group permissions + Ranger, > etc) > > > > >>> - Requirements around concept drift > > > > >>> - I'd love to hear user requirements around how we could > > > > automatically > > > > >>> address concept drift. The architecture as it's proposed > > let's > > > > the user > > > > >>> decide when to update models. > > > > >>> - Requirements around model output > > > > >>> - The architecture as it stands just mandates a JSON map input > > and > > > > JSON > > > > >>> map output, so it's up to the model what they want to pass > > > back. > > > > >>> - It's also up to the model to document its own output. > > > > >>> - Any model audit and logging requirements > > > > >>> - The architecture proposes no constraints around this. I'd > love > > > to > > > > see > > > > >>> community guidance around this. As it stands, we just log > > using > > > > the same > > > > >>> mechanism as any YARN application. > > > > >>> - What model metrics need to be exposed > > > > >>> - The architecture proposes no constraints around this. I'd > love > > > to > > > > see > > > > >>> community guidance around this. > > > > >>> - Requirements around failure modes > > > > >>> - We briefly touch on this in the document, but it is probably > > not > > > > >>> complete. Service endpoint failure will result in > > blacklisting > > > > from a > > > > >>> storm bolt perspective and node failure should result in a > > new > > > > container > > > > >>> being started by the Yarn application master. Beyond that, > > the > > > > >>> architecture isn't explicit. > > > > >>> > > > > >>>> On Mon, Jul 4, 2016 at 1:49 PM, James Sirota < > [email protected] > > > > > > > wrote: > > > > >>>> > > > > >>>> I left a comment on the JIRA. I think your design is promising. > > One > > > > >>>> other thing I would suggest is for us to crowd source > > requirements > > > > around > > > > >>>> model management. Specifically: > > > > >>>> > > > > >>>> Model versioning and retention > > > > >>>> Model access controls management > > > > >>>> Requirements around concept drift > > > > >>>> Requirements around model output > > > > >>>> Any model audit and logging requirements > > > > >>>> What model metrics need to be exposed > > > > >>>> Requirements around failure modes > > > > >>>> > > > > >>>> 03.07.2016, 14:00, "Casey Stella" <[email protected]>: > > > > >>>>> Hi all, > > > > >>>>> > > > > >>>>> I think we are at the point where we should try to tackle > Model > > > as a > > > > >>>>> service for Metron. As such, I created a JIRA and proposed an > > > > >>>> architecture > > > > >>>>> for accomplishing this within Metron. > > > > >>>>> > > > > >>>>> My inclination is to be data science language/library agnostic > > and > > > > to > > > > >>>>> provide a general purpose REST infrastructure for managing and > > > > serving > > > > >>>>> models trained on historical data captured from Metron. The > > > > assumption is > > > > >>>>> that we are within the hadoop ecosystem, so: > > > > >>>>> > > > > >>>>> - Models stored on HDFS > > > > >>>>> - REST Model Services resource-managed via Yarn > > > > >>>>> - REST Model Services discovered via Zookeeper. > > > > >>>>> > > > > >>>>> I would really appreciate community comment on the JIRA ( > > > > >>>>> https://issues.apache.org/jira/browse/METRON-265). The > proposed > > > > >>>>> architecture is attached as a document to that JIRA. > > > > >>>>> > > > > >>>>> I look forward to feedback! > > > > >>>>> > > > > >>>>> Best, > > > > >>>>> > > > > >>>>> Casey > > > > >>>> > > > > >>>> ------------------- > > > > >>>> Thank you, > > > > >>>> > > > > >>>> James Sirota > > > > >>>> PPMC- Apache Metron (Incubating) > > > > >>>> jsirota AT apache DOT org > > > > > > > > > >------------------- > > > > >Thank you, > > > > > > > > > >James Sirota > > > > >PPMC- Apache Metron (Incubating) > > > > >jsirota AT apache DOT org > > > > > > > > > > > > > > > -- > > Thanks, > > Andrew > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis> > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata> > > > -- Thanks, Andrew Subscribe to my book: Streaming Data <http://manning.com/psaltis> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
