Re: Metron-265 Model as a Service

Debo Dutta (dedutta) Tue, 05 Jul 2016 15:46:21 -0700

Following up on the thread a little late …. Awesome start Casey. Some comments:
* Model execution 
** I am guessing the model execution will be on YARN only for now. This is 
fine, but the REST call could have an overhead - depends on the speed. 
* PMML: won’t we have to choose some DSL for describing models?
* Model: 
** workflow vs a model -  do we care about the “workflow" that leads to the 
models or just the “model"? For example, we might start with n features —> do 
feature selection to choose k (or apply a transform function) —> apply a model 
etc
* Use cases - I can see this working for n-ary classification style models 
easily. Will the same mechanism be used for stuff like clustering (or 
intermediate steps like feature selection alone).


Thx
debo




On 7/5/16, 3:24 PM, "James Sirota" <[email protected]> wrote:

>Simon,
>
>There are several reasons to decouple model execution from Storm:
>
>- Reliability: It's much easier to handle a failed service than a failed bolt. 
> You can also troubleshoot without having to bring down the topology
>- Complexity: you de-couple the model logic from Storm logic and can manage it 
>independently of Storm
>- Portability: you can swap the model guts (switch from Spark to Flink, etc) 
>and as long as you maintain the interface you are good to go
>- Consistency: since we want to expose our models the same way we expose 
>threat intel then it makes sense to expose them as a service
>
>In our vision for Metron we want to make it easy to uptake and share models.  
>I think well-defined interfaces and programmatic ways of deployment, lifecycle 
>management, and scoring via well-defined REST interfaces will make this task 
>easier.  We can do a few things to 
>
>With respect to PMML I personally had not had much luck with it in production. 
> I would prefer models as POJOs. 
>
>Thanks,
>James 
>
>04.07.2016, 16:07, "Simon Ball" <[email protected]>:
>> Since the models' parameters and execution algorithm are likely to be small, 
>> why not have the model store push the model changes and scoring direct to 
>> the bolts and execute within storm. This negates the overhead of a rest call 
>> to the model server, and the need for discovery of the model server in 
>> zookeeper.
>>
>> Something like the way ranger policies are updated / cached in plugins would 
>> seem to make sense, so that we're distributing the model execution directly 
>> into the enrichment pipeline rather than collecting in a central service.
>>
>> This would work with simple models on single events, but may struggle with 
>> correlation based models. However, those could be handled in storm by 
>> pushing into a windowing trident topology or something of the sort, or even 
>> with a parallel spark streaming job using the same method of distributing 
>> models.
>>
>> The real challenge here would be stateful online models, which seem like a 
>> minority case which could be handled by a shared state store such as HBase.
>>
>> You still keep the ability to run different languages, and platforms, but 
>> wrap managing the parallelism in storm bolts rather than yarn containers.
>>
>> We could also consider basing the model protocol on a a common model 
>> language like pmml, thong that is likely to be highly limiting.
>>
>> Simon
>>
>>>  On 4 Jul 2016, at 22:35, Casey Stella <[email protected]> wrote:
>>>
>>>  This is great! I'll capture any requirements that anyone wants to
>>>  contribute and ensure that the proposed architecture accommodates them. I
>>>  think we should focus on a minimal set of requirements and an architecture
>>>  that does not preclude a larger set. I have found that the best driver of
>>>  requirements are installed users. :)
>>>
>>>  For instance, I think a lot of questions about how often to update a model
>>>  and such should be represented in the architecture by the ability to
>>>  manually update a model, so as long as we have the ability to update,
>>>  people can choose when and where to do it (i.e. time based or some other
>>>  trigger). That being said, we don't want to cause too much effort for the
>>>  user if we can avoid it with features.
>>>
>>>  In terms of the questions laid out, here are the constraints from the
>>>  proposed architecture as I see them. It'd be great to get a sense of
>>>  whether these constraints are too onerous or where they're not opinionated
>>>  enough :
>>>
>>>    - Model versioning and retention
>>>    - We do have the ability to update models, but the training and decision
>>>       of when to update the model is left up to the user. We may want to 
>>> think
>>>       deeply about when and where automated model updates can fit
>>>       - Also, retention is currently manual. It might be an easier win to
>>>       set up policies around when to sunset models (after newer versions are
>>>       added, for instance).
>>>    - Model access controls management
>>>    - The architecture proposes no constraints around this. As it stands
>>>       now, models are held in HDFS, so it would inherit the same security
>>>       capabilities from that (user/group permissions + Ranger, etc)
>>>    - Requirements around concept drift
>>>    - I'd love to hear user requirements around how we could automatically
>>>       address concept drift. The architecture as it's proposed let's the 
>>> user
>>>       decide when to update models.
>>>    - Requirements around model output
>>>    - The architecture as it stands just mandates a JSON map input and JSON
>>>       map output, so it's up to the model what they want to pass back.
>>>       - It's also up to the model to document its own output.
>>>    - Any model audit and logging requirements
>>>    - The architecture proposes no constraints around this. I'd love to see
>>>       community guidance around this. As it stands, we just log using the 
>>> same
>>>       mechanism as any YARN application.
>>>    - What model metrics need to be exposed
>>>    - The architecture proposes no constraints around this. I'd love to see
>>>       community guidance around this.
>>>       - Requirements around failure modes
>>>    - We briefly touch on this in the document, but it is probably not
>>>       complete. Service endpoint failure will result in blacklisting from a
>>>       storm bolt perspective and node failure should result in a new 
>>> container
>>>       being started by the Yarn application master. Beyond that, the
>>>       architecture isn't explicit.
>>>
>>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <[email protected]> wrote:
>>>>
>>>>  I left a comment on the JIRA. I think your design is promising. One
>>>>  other thing I would suggest is for us to crowd source requirements around
>>>>  model management. Specifically:
>>>>
>>>>  Model versioning and retention
>>>>  Model access controls management
>>>>  Requirements around concept drift
>>>>  Requirements around model output
>>>>  Any model audit and logging requirements
>>>>  What model metrics need to be exposed
>>>>  Requirements around failure modes
>>>>
>>>>  03.07.2016, 14:00, "Casey Stella" <[email protected]>:
>>>>>  Hi all,
>>>>>
>>>>>  I think we are at the point where we should try to tackle Model as a
>>>>>  service for Metron. As such, I created a JIRA and proposed an
>>>>  architecture
>>>>>  for accomplishing this within Metron.
>>>>>
>>>>>  My inclination is to be data science language/library agnostic and to
>>>>>  provide a general purpose REST infrastructure for managing and serving
>>>>>  models trained on historical data captured from Metron. The assumption is
>>>>>  that we are within the hadoop ecosystem, so:
>>>>>
>>>>>    - Models stored on HDFS
>>>>>    - REST Model Services resource-managed via Yarn
>>>>>    - REST Model Services discovered via Zookeeper.
>>>>>
>>>>>  I would really appreciate community comment on the JIRA (
>>>>>  https://issues.apache.org/jira/browse/METRON-265). The proposed
>>>>>  architecture is attached as a document to that JIRA.
>>>>>
>>>>>  I look forward to feedback!
>>>>>
>>>>>  Best,
>>>>>
>>>>>  Casey
>>>>
>>>>  -------------------
>>>>  Thank you,
>>>>
>>>>  James Sirota
>>>>  PPMC- Apache Metron (Incubating)
>>>>  jsirota AT apache DOT org
>
>------------------- 
>Thank you,
>
>James Sirota
>PPMC- Apache Metron (Incubating)
>jsirota AT apache DOT org

Re: Metron-265 Model as a Service

Reply via email to