Thanks Casey that all makes better sense now. I agree that using Curator is a better route then a call out to the yarn registry.
On Thu, Jul 7, 2016 at 3:00 PM, Casey Stella <[email protected]> wrote: > > > > Considering both the storm bolts and the model service will be deployed > > on Yarn, could the bolts not > > use the Yarn registry to identify which model service to connect to > before > > making a request? > > > The bolts are definitely going to figure out which endpoints are serving > which models, but that info will come from zookeeper and get pushed to the > bolts on change, rather than have a separate request to the yarn registry. > > How do you scale the model service endpoints if they have a preference for > > which model they serve? > > > I'd say preference is a loose term. We'll probably just use a weighted die > and bias the choice toward local endpoints over a remote endpoints. Let's > all keep in mind here that there are real reasons why you might not have a > model executed from the same node as a storm worker. Take for instance a > tensorflow model that *needs* GPUs, you might never run a storm worker on > those nodes. In that situation, the network hop will probably be dominated > by the computation done in scoring and it's probably not cost effective to > scale storm along with GPU nodes. > > > > And each is a simple REST (or another more performant protocol) service > > as the document describes? > > > Yep. > > > On Thu, Jul 7, 2016 at 11:14 AM, Andrew Psaltis <[email protected]> > wrote: > > > Thanks Casey, that helps. > > > > RE: I am talking about model execution here. The endpoints are > distributed > > across the cluster and the storm bolt chooses a service to use (with a > bias > > toward using one that is local to that bolt) and the request is made to > the > > endpoint, which scores the input and returns the response. > > > > This makes sense. Depending on volume and velocity of data seems like > this > > could get expensive., > > > > > > RE: Model service, if that term means what I think it means, is almost > > entirely done inside of zookeeper. For clarity, I'm talking about > service > > discovery (bolt discovers which endpoints serve which models) and model > > updates > > > > Thanks this helps to clarify it quite a bit. Considering both the storm > > bolts and the model service will be deployed on Yarn, could the bolts not > > use the Yarn registry to identify which model service to connect to > before > > making a request? > > > > How do you scale the model service endpoints if they have a preference > for > > which model they serve? And each is a simple REST (or another more > > performant protocol) service as the document describes? > > > > > > > > Thanks, > > Andrew > > > > On Thu, Jul 7, 2016 at 1:51 PM, Casey Stella <[email protected]> wrote: > > > > > Great questions Andrew. Thanks for the interest. :) > > > > > > RE:: "which is why there would be a caching layer set in front of it at > > the > > > Storm bolt level" > > > > > > Right now we have a LRU caching layer in front of the HBase enrichment > > > adapters, so it would work similarly. You can imagine, the range of > > inputs > > > is likely not perfectly random, so it's reasonable for the cache to > have > > a > > > non-empty working set. Take for instance a DGA model; the input would > > be a > > > domain and most organizations will have an uneven distribution of > domains > > > they access with a heavy skew toward a small number. > > > > > > RE: In this scenario, you can at least scale out via load balancing > (i.e. > > > multiple model services serving the same model) since the models are > > > immutable. > > > > > > I am talking about model execution here. The endpoints are distributed > > > across the cluster and the storm bolt chooses a service to use (with a > > bias > > > toward using one that is local to that bolt) and the request is made to > > the > > > endpoint, which scores the input and returns the response. > > > > > > Model service, if that term means what I think it means, is almost > > entirely > > > done inside of zookeeper. For clarity, I'm talking about service > > discovery > > > (bolt discovers which endpoints serve which models) and model updates. > > We > > > are not sending the model around to any bolts or any such thing, just > for > > > clarity sake. > > > > > > > > > > > > On Thu, Jul 7, 2016 at 9:47 AM, Andrew Psaltis < > [email protected] > > > > > > wrote: > > > > > > > Thanks Casey! Couple of quick questions. > > > > > > > > RE:: "which is why there would be a caching layer set in front of it > at > > > the > > > > Storm bolt level" > > > > Hmm, would this be of the results of model execution? Would this > really > > > > work when each tuple may contain totally different data? Or is the > > > caching > > > > going to be smart enough that it will look at all the data passed in > > and > > > > determine that an identical tuple has already been evaluated so serve > > the > > > > result out of cache? > > > > > > > > RE: "Also, we would prefer local instances of the service when and > > where > > > > possible" > > > > Perfect makes sense. > > > > > > > > RE: Serving many models from every storm bolt is also fairly > expensive. > > > > I can see how it could be, but couldn't we can make sure that not > all > > > > models live in every bolt? > > > > > > > > RE: In this scenario, you can at least scale out via load balancing > > (i.e. > > > > multiple model services serving the same model) since the models are > > > > immutable. > > > > This seems to address the model serving, not model execution service. > > > > Having yet one more layer to scale and mange also seems like it > > > > would further complicate things. Could we not just also scale the > > bolts? > > > > > > > > Thanks, > > > > Andrew > > > > > > > > > > > > > > > > > > > > On Thu, Jul 7, 2016 at 12:37 PM, Casey Stella <[email protected]> > > > wrote: > > > > > > > > > So, regarding the expense of communication; I tend to agree that it > > is > > > > > expensive, which is why there would be a caching layer set in front > > of > > > it > > > > > at the Storm bolt level. Also, we would prefer local instances of > > the > > > > > service when and where possible. Serving many models from every > > storm > > > > bolt > > > > > is also fairly expensive. In this scenario, you can at least scale > > out > > > > via > > > > > load balancing (i.e. multiple model services serving the same > model) > > > > since > > > > > the models are immutable. > > > > > > > > > > On Thu, Jul 7, 2016 at 9:24 AM, Andrew Psaltis < > > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > OK that makes sense. So the doc attached to this JIRA[1] just > > speaks > > > to > > > > > the > > > > > > Model serving. Is there a doc for the model service? And by > making > > > > this a > > > > > > separate service we are saying that for every > > > “MODEL_APPLY(model_name, > > > > > > param_1, param_2, …, param_n)” we are potentially going to go > > across > > > > the > > > > > > wire and have a model executed? That seems pretty expensive, no? > > > > > > > > > > > > Thanks, > > > > > > Andrew > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/METRON-265 > > > > > > > > > > > > On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > The "REST" model service, which I place in quotes because there > > is > > > > some > > > > > > > strong discussion about whether REST is a reasonable transport > > for > > > > > this, > > > > > > is > > > > > > > responsible for providing the model. The scoring/model > > application > > > > > > happens > > > > > > > in the model service and the results get transferred back to > the > > > > storm > > > > > > bolt > > > > > > > that calls it. > > > > > > > > > > > > > > Casey > > > > > > > > > > > > > > On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis < > > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Trying to make sure I grok this thread and the word doc > > attached > > > to > > > > > the > > > > > > > > JIRA. The word doc and JIRA speak to a Model Service Service > > and > > > > that > > > > > > the > > > > > > > > REST service will be responsible for serving up models. > > However, > > > > part > > > > > > of > > > > > > > > this conversation seems to suggest that the model execution > > will > > > > > > actually > > > > > > > > occur at the REST service .. in particular this comment from > > > James: > > > > > > > > > > > > > > > > "There are several reasons to decouple model execution from > > > Storm:" > > > > > > > > > > > > > > > > If the model execution is decoupled from Storm then it > appears > > > that > > > > > the > > > > > > > > REST service will be executing the model, not just serving it > > up, > > > > is > > > > > > that > > > > > > > > correct? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Regarding the performance of REST: > > > > > > > > > > > > > > > > > > Yep, so everyone seems to be worried about the performance > > > > > > implications > > > > > > > > for > > > > > > > > > REST. I made this comment on the JIRA, but I'll repeat it > > here > > > > for > > > > > > > > broader > > > > > > > > > discussion: > > > > > > > > > > > > > > > > > > My choice of REST was mostly due to the fact that I want to > > > > support > > > > > > > > > > multi-language (I think that's a very important > > requirement) > > > > and > > > > > > > there > > > > > > > > > are > > > > > > > > > > REST libraries for pretty much everything. I do agree, > > > however, > > > > > > that > > > > > > > > JSON > > > > > > > > > > transport can get chunky. How about a compromise and use > > > REST, > > > > > but > > > > > > > the > > > > > > > > > > input and output payloads for scoring are Maps encoded in > > > > msgpack > > > > > > > > rather > > > > > > > > > > than JSON. There is a msgpack library for pretty much > every > > > > > > language > > > > > > > > out > > > > > > > > > > there (almost) and certainly all of the ones we'd like to > > > > target. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The other option is to just create and expose protobuf > > > bindings > > > > > > > (thrift > > > > > > > > > > doesn't have a native client for R) for all of the > > languages > > > > that > > > > > > we > > > > > > > > want > > > > > > > > > > to support. I'm perfectly fine with that, but I had some > > > > worries > > > > > > > about > > > > > > > > > the > > > > > > > > > > maturity of the bindings. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The final option, as you suggest, is to just use raw > > > sockets. I > > > > > > think > > > > > > > > if > > > > > > > > > > we went that route, we might have to create a layer for > > each > > > > > > language > > > > > > > > > > rather than relying on model creators to create a TCP > > > server. I > > > > > > > thought > > > > > > > > > > that might be a bit onerous for a MVP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Given the discussion, though, what it has made me aware > of > > is > > > > > that > > > > > > we > > > > > > > > > > might not want to dictate a transport mechanism at all, > but > > > > > rather > > > > > > > > allow > > > > > > > > > > that to be pluggable and extensible (so each model would > be > > > > > > > associated > > > > > > > > > with > > > > > > > > > > a transport mechanism handler that would know how to > > > > communicate > > > > > to > > > > > > > it. > > > > > > > > > We > > > > > > > > > > would provide default mechanisms for msgpack over REST, > > JSON > > > > over > > > > > > > REST > > > > > > > > > and > > > > > > > > > > maybe msgpack over raw TCP.) Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding PMML: > > > > > > > > > > > > > > > > > > I tend to agree with James that PMML is too restrictive as > to > > > > > models > > > > > > it > > > > > > > > can > > > > > > > > > represent and I have not had great experiences with it in > > > > > production. > > > > > > > > > Also, the open source libraries for PMML have licensing > > issues > > > > > (jpmml > > > > > > > > > requires an older version to accommodate our licensing > > > > > requirements). > > > > > > > > > > > > > > > > > > Regarding workflow: > > > > > > > > > > > > > > > > > > At the moment, I'd like to focus on getting a generalized > > > > > > > infrastructure > > > > > > > > > for model scoring and updating put in place. This means, > > this > > > > > > > > > architecture takes up the baton from the point when a model > > is > > > > > > > > > trained/created. Also, I have attempted to be generic in > > terms > > > > of > > > > > > > output > > > > > > > > > of the model (a map of results) so it can fit any type of > > model > > > > > that > > > > > > I > > > > > > > > can > > > > > > > > > think of. If that's not the case, let me know, though. > > > > > > > > > > > > > > > > > > For instance, for clustering, you would probably emit the > > > cluster > > > > > id > > > > > > > > > associated with the input and that would be added to the > > > message > > > > as > > > > > > it > > > > > > > > > passes through the storm topology. The model is > responsible > > > for > > > > > > > > processing > > > > > > > > > the input and constructing properly formed output. > > > > > > > > > > > > > > > > > > Casey > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) < > > > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Following up on the thread a little late …. Awesome start > > > > Casey. > > > > > > Some > > > > > > > > > > comments: > > > > > > > > > > * Model execution > > > > > > > > > > ** I am guessing the model execution will be on YARN only > > for > > > > > now. > > > > > > > This > > > > > > > > > is > > > > > > > > > > fine, but the REST call could have an overhead - depends > on > > > the > > > > > > > speed. > > > > > > > > > > * PMML: won’t we have to choose some DSL for describing > > > models? > > > > > > > > > > * Model: > > > > > > > > > > ** workflow vs a model - do we care about the “workflow" > > > that > > > > > > leads > > > > > > > to > > > > > > > > > > the models or just the “model"? For example, we might > start > > > > with > > > > > n > > > > > > > > > features > > > > > > > > > > —> do feature selection to choose k (or apply a transform > > > > > function) > > > > > > > —> > > > > > > > > > > apply a model etc > > > > > > > > > > * Use cases - I can see this working for n-ary > > classification > > > > > style > > > > > > > > > models > > > > > > > > > > easily. Will the same mechanism be used for stuff like > > > > clustering > > > > > > (or > > > > > > > > > > intermediate steps like feature selection alone). > > > > > > > > > > > > > > > > > > > > Thx > > > > > > > > > > debo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> > > > wrote: > > > > > > > > > > > > > > > > > > > > >Simon, > > > > > > > > > > > > > > > > > > > > > >There are several reasons to decouple model execution > from > > > > > Storm: > > > > > > > > > > > > > > > > > > > > > >- Reliability: It's much easier to handle a failed > service > > > > than > > > > > a > > > > > > > > failed > > > > > > > > > > bolt. You can also troubleshoot without having to bring > > down > > > > the > > > > > > > > > topology > > > > > > > > > > >- Complexity: you de-couple the model logic from Storm > > logic > > > > and > > > > > > can > > > > > > > > > > manage it independently of Storm > > > > > > > > > > >- Portability: you can swap the model guts (switch from > > > Spark > > > > to > > > > > > > > Flink, > > > > > > > > > > etc) and as long as you maintain the interface you are > good > > > to > > > > go > > > > > > > > > > >- Consistency: since we want to expose our models the > same > > > way > > > > > we > > > > > > > > expose > > > > > > > > > > threat intel then it makes sense to expose them as a > > service > > > > > > > > > > > > > > > > > > > > > >In our vision for Metron we want to make it easy to > uptake > > > and > > > > > > share > > > > > > > > > > models. I think well-defined interfaces and programmatic > > > ways > > > > of > > > > > > > > > > deployment, lifecycle management, and scoring via > > > well-defined > > > > > REST > > > > > > > > > > interfaces will make this task easier. We can do a few > > > things > > > > to > > > > > > > > > > > > > > > > > > > > > >With respect to PMML I personally had not had much luck > > with > > > > it > > > > > in > > > > > > > > > > production. I would prefer models as POJOs. > > > > > > > > > > > > > > > > > > > > > >Thanks, > > > > > > > > > > >James > > > > > > > > > > > > > > > > > > > > > >04.07.2016, 16:07, "Simon Ball" <[email protected] > >: > > > > > > > > > > >> Since the models' parameters and execution algorithm > are > > > > > likely > > > > > > to > > > > > > > > be > > > > > > > > > > small, why not have the model store push the model > changes > > > and > > > > > > > scoring > > > > > > > > > > direct to the bolts and execute within storm. This > negates > > > the > > > > > > > overhead > > > > > > > > > of > > > > > > > > > > a rest call to the model server, and the need for > discovery > > > of > > > > > the > > > > > > > > model > > > > > > > > > > server in zookeeper. > > > > > > > > > > >> > > > > > > > > > > >> Something like the way ranger policies are updated / > > > cached > > > > in > > > > > > > > plugins > > > > > > > > > > would seem to make sense, so that we're distributing the > > > model > > > > > > > > execution > > > > > > > > > > directly into the enrichment pipeline rather than > > collecting > > > > in a > > > > > > > > central > > > > > > > > > > service. > > > > > > > > > > >> > > > > > > > > > > >> This would work with simple models on single events, > but > > > may > > > > > > > > struggle > > > > > > > > > > with correlation based models. However, those could be > > > handled > > > > in > > > > > > > storm > > > > > > > > > by > > > > > > > > > > pushing into a windowing trident topology or something of > > the > > > > > sort, > > > > > > > or > > > > > > > > > even > > > > > > > > > > with a parallel spark streaming job using the same method > > of > > > > > > > > distributing > > > > > > > > > > models. > > > > > > > > > > >> > > > > > > > > > > >> The real challenge here would be stateful online > models, > > > > which > > > > > > > seem > > > > > > > > > > like a minority case which could be handled by a shared > > state > > > > > store > > > > > > > > such > > > > > > > > > as > > > > > > > > > > HBase. > > > > > > > > > > >> > > > > > > > > > > >> You still keep the ability to run different languages, > > and > > > > > > > > platforms, > > > > > > > > > > but wrap managing the parallelism in storm bolts rather > > than > > > > yarn > > > > > > > > > > containers. > > > > > > > > > > >> > > > > > > > > > > >> We could also consider basing the model protocol on a > a > > > > common > > > > > > > model > > > > > > > > > > language like pmml, thong that is likely to be highly > > > limiting. > > > > > > > > > > >> > > > > > > > > > > >> Simon > > > > > > > > > > >> > > > > > > > > > > >>> On 4 Jul 2016, at 22:35, Casey Stella < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > >>> > > > > > > > > > > >>> This is great! I'll capture any requirements that > > anyone > > > > > wants > > > > > > > to > > > > > > > > > > >>> contribute and ensure that the proposed architecture > > > > > > > accommodates > > > > > > > > > > them. I > > > > > > > > > > >>> think we should focus on a minimal set of > requirements > > > and > > > > > an > > > > > > > > > > architecture > > > > > > > > > > >>> that does not preclude a larger set. I have found > that > > > the > > > > > > best > > > > > > > > > > driver of > > > > > > > > > > >>> requirements are installed users. :) > > > > > > > > > > >>> > > > > > > > > > > >>> For instance, I think a lot of questions about how > > often > > > > to > > > > > > > > update a > > > > > > > > > > model > > > > > > > > > > >>> and such should be represented in the architecture > by > > > the > > > > > > > ability > > > > > > > > to > > > > > > > > > > >>> manually update a model, so as long as we have the > > > ability > > > > > to > > > > > > > > > update, > > > > > > > > > > >>> people can choose when and where to do it (i.e. time > > > based > > > > > or > > > > > > > some > > > > > > > > > > other > > > > > > > > > > >>> trigger). That being said, we don't want to cause > too > > > much > > > > > > > effort > > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > >>> user if we can avoid it with features. > > > > > > > > > > >>> > > > > > > > > > > >>> In terms of the questions laid out, here are the > > > > constraints > > > > > > > from > > > > > > > > > the > > > > > > > > > > >>> proposed architecture as I see them. It'd be great > to > > > get > > > > a > > > > > > > sense > > > > > > > > of > > > > > > > > > > >>> whether these constraints are too onerous or where > > > they're > > > > > not > > > > > > > > > > opinionated > > > > > > > > > > >>> enough : > > > > > > > > > > >>> > > > > > > > > > > >>> - Model versioning and retention > > > > > > > > > > >>> - We do have the ability to update models, but the > > > > > training > > > > > > > and > > > > > > > > > > decision > > > > > > > > > > >>> of when to update the model is left up to the > > user. > > > > We > > > > > > may > > > > > > > > want > > > > > > > > > > to think > > > > > > > > > > >>> deeply about when and where automated model > > updates > > > > can > > > > > > fit > > > > > > > > > > >>> - Also, retention is currently manual. It might > > be > > > an > > > > > > > easier > > > > > > > > > win > > > > > > > > > > to > > > > > > > > > > >>> set up policies around when to sunset models > > (after > > > > > newer > > > > > > > > > > versions are > > > > > > > > > > >>> added, for instance). > > > > > > > > > > >>> - Model access controls management > > > > > > > > > > >>> - The architecture proposes no constraints around > > > this. > > > > As > > > > > > it > > > > > > > > > stands > > > > > > > > > > >>> now, models are held in HDFS, so it would > inherit > > > the > > > > > > same > > > > > > > > > > security > > > > > > > > > > >>> capabilities from that (user/group permissions > + > > > > > Ranger, > > > > > > > etc) > > > > > > > > > > >>> - Requirements around concept drift > > > > > > > > > > >>> - I'd love to hear user requirements around how we > > > could > > > > > > > > > > automatically > > > > > > > > > > >>> address concept drift. The architecture as it's > > > > > proposed > > > > > > > > let's > > > > > > > > > > the user > > > > > > > > > > >>> decide when to update models. > > > > > > > > > > >>> - Requirements around model output > > > > > > > > > > >>> - The architecture as it stands just mandates a > JSON > > > map > > > > > > input > > > > > > > > and > > > > > > > > > > JSON > > > > > > > > > > >>> map output, so it's up to the model what they > > want > > > to > > > > > > pass > > > > > > > > > back. > > > > > > > > > > >>> - It's also up to the model to document its own > > > > output. > > > > > > > > > > >>> - Any model audit and logging requirements > > > > > > > > > > >>> - The architecture proposes no constraints around > > > this. > > > > > I'd > > > > > > > love > > > > > > > > > to > > > > > > > > > > see > > > > > > > > > > >>> community guidance around this. As it stands, > we > > > just > > > > > log > > > > > > > > using > > > > > > > > > > the same > > > > > > > > > > >>> mechanism as any YARN application. > > > > > > > > > > >>> - What model metrics need to be exposed > > > > > > > > > > >>> - The architecture proposes no constraints around > > > this. > > > > > I'd > > > > > > > love > > > > > > > > > to > > > > > > > > > > see > > > > > > > > > > >>> community guidance around this. > > > > > > > > > > >>> - Requirements around failure modes > > > > > > > > > > >>> - We briefly touch on this in the document, but it > > is > > > > > > probably > > > > > > > > not > > > > > > > > > > >>> complete. Service endpoint failure will result > in > > > > > > > > blacklisting > > > > > > > > > > from a > > > > > > > > > > >>> storm bolt perspective and node failure should > > > result > > > > > in > > > > > > a > > > > > > > > new > > > > > > > > > > container > > > > > > > > > > >>> being started by the Yarn application master. > > > Beyond > > > > > > that, > > > > > > > > the > > > > > > > > > > >>> architecture isn't explicit. > > > > > > > > > > >>> > > > > > > > > > > >>>> On Mon, Jul 4, 2016 at 1:49 PM, James Sirota < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > >>>> > > > > > > > > > > >>>> I left a comment on the JIRA. I think your design > is > > > > > > promising. > > > > > > > > One > > > > > > > > > > >>>> other thing I would suggest is for us to crowd > source > > > > > > > > requirements > > > > > > > > > > around > > > > > > > > > > >>>> model management. Specifically: > > > > > > > > > > >>>> > > > > > > > > > > >>>> Model versioning and retention > > > > > > > > > > >>>> Model access controls management > > > > > > > > > > >>>> Requirements around concept drift > > > > > > > > > > >>>> Requirements around model output > > > > > > > > > > >>>> Any model audit and logging requirements > > > > > > > > > > >>>> What model metrics need to be exposed > > > > > > > > > > >>>> Requirements around failure modes > > > > > > > > > > >>>> > > > > > > > > > > >>>> 03.07.2016, 14:00, "Casey Stella" < > > [email protected] > > > >: > > > > > > > > > > >>>>> Hi all, > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> I think we are at the point where we should try to > > > > tackle > > > > > > > Model > > > > > > > > > as a > > > > > > > > > > >>>>> service for Metron. As such, I created a JIRA and > > > > proposed > > > > > > an > > > > > > > > > > >>>> architecture > > > > > > > > > > >>>>> for accomplishing this within Metron. > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> My inclination is to be data science > > language/library > > > > > > agnostic > > > > > > > > and > > > > > > > > > > to > > > > > > > > > > >>>>> provide a general purpose REST infrastructure for > > > > managing > > > > > > and > > > > > > > > > > serving > > > > > > > > > > >>>>> models trained on historical data captured from > > > Metron. > > > > > The > > > > > > > > > > assumption is > > > > > > > > > > >>>>> that we are within the hadoop ecosystem, so: > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> - Models stored on HDFS > > > > > > > > > > >>>>> - REST Model Services resource-managed via Yarn > > > > > > > > > > >>>>> - REST Model Services discovered via Zookeeper. > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> I would really appreciate community comment on the > > > JIRA > > > > ( > > > > > > > > > > >>>>> https://issues.apache.org/jira/browse/METRON-265 > ). > > > The > > > > > > > proposed > > > > > > > > > > >>>>> architecture is attached as a document to that > JIRA. > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> I look forward to feedback! > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> Best, > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> Casey > > > > > > > > > > >>>> > > > > > > > > > > >>>> ------------------- > > > > > > > > > > >>>> Thank you, > > > > > > > > > > >>>> > > > > > > > > > > >>>> James Sirota > > > > > > > > > > >>>> PPMC- Apache Metron (Incubating) > > > > > > > > > > >>>> jsirota AT apache DOT org > > > > > > > > > > > > > > > > > > > > > >------------------- > > > > > > > > > > >Thank you, > > > > > > > > > > > > > > > > > > > > > >James Sirota > > > > > > > > > > >PPMC- Apache Metron (Incubating) > > > > > > > > > > >jsirota AT apache DOT org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Thanks, > > > > > > > > Andrew > > > > > > > > > > > > > > > > Subscribe to my book: Streaming Data < > > http://manning.com/psaltis > > > > > > > > > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > > > > > > > twiiter: @itmdata < > > > > > http://twitter.com/intent/user?screen_name=itmdata> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Thanks, > > > > > > Andrew > > > > > > > > > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis > > > > > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > > > > > twiiter: @itmdata < > > > http://twitter.com/intent/user?screen_name=itmdata> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks, > > > > Andrew > > > > > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis> > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > > > twiiter: @itmdata < > http://twitter.com/intent/user?screen_name=itmdata> > > > > > > > > > > > > > > > -- > > Thanks, > > Andrew > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis> > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata> > > > -- Thanks, Andrew Subscribe to my book: Streaming Data <http://manning.com/psaltis> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
