Re: Metron-265 Model as a Service

James Sirota Tue, 05 Jul 2016 15:24:35 -0700

Simon,

There are several reasons to decouple model execution from Storm:


- Reliability: It's much easier to handle a failed service than a failed bolt.  
You can also troubleshoot without having to bring down the topology
- Complexity: you de-couple the model logic from Storm logic and can manage it 
independently of Storm
- Portability: you can swap the model guts (switch from Spark to Flink, etc) 
and as long as you maintain the interface you are good to go
- Consistency: since we want to expose our models the same way we expose threat 
intel then it makes sense to expose them as a service

In our vision for Metron we want to make it easy to uptake and share models.  I 
think well-defined interfaces and programmatic ways of deployment, lifecycle 
management, and scoring via well-defined REST interfaces will make this task 
easier.  We can do a few things to 

With respect to PMML I personally had not had much luck with it in production.  
I would prefer models as POJOs. 

Thanks,
James 

04.07.2016, 16:07, "Simon Ball" <[email protected]>:
> Since the models' parameters and execution algorithm are likely to be small, 
> why not have the model store push the model changes and scoring direct to the 
> bolts and execute within storm. This negates the overhead of a rest call to 
> the model server, and the need for discovery of the model server in zookeeper.
>
> Something like the way ranger policies are updated / cached in plugins would 
> seem to make sense, so that we're distributing the model execution directly 
> into the enrichment pipeline rather than collecting in a central service.
>
> This would work with simple models on single events, but may struggle with 
> correlation based models. However, those could be handled in storm by pushing 
> into a windowing trident topology or something of the sort, or even with a 
> parallel spark streaming job using the same method of distributing models.
>
> The real challenge here would be stateful online models, which seem like a 
> minority case which could be handled by a shared state store such as HBase.
>
> You still keep the ability to run different languages, and platforms, but 
> wrap managing the parallelism in storm bolts rather than yarn containers.
>
> We could also consider basing the model protocol on a a common model language 
> like pmml, thong that is likely to be highly limiting.
>
> Simon
>
>>  On 4 Jul 2016, at 22:35, Casey Stella <[email protected]> wrote:
>>
>>  This is great! I'll capture any requirements that anyone wants to
>>  contribute and ensure that the proposed architecture accommodates them. I
>>  think we should focus on a minimal set of requirements and an architecture
>>  that does not preclude a larger set. I have found that the best driver of
>>  requirements are installed users. :)
>>
>>  For instance, I think a lot of questions about how often to update a model
>>  and such should be represented in the architecture by the ability to
>>  manually update a model, so as long as we have the ability to update,
>>  people can choose when and where to do it (i.e. time based or some other
>>  trigger). That being said, we don't want to cause too much effort for the
>>  user if we can avoid it with features.
>>
>>  In terms of the questions laid out, here are the constraints from the
>>  proposed architecture as I see them. It'd be great to get a sense of
>>  whether these constraints are too onerous or where they're not opinionated
>>  enough :
>>
>>    - Model versioning and retention
>>    - We do have the ability to update models, but the training and decision
>>       of when to update the model is left up to the user. We may want to 
>> think
>>       deeply about when and where automated model updates can fit
>>       - Also, retention is currently manual. It might be an easier win to
>>       set up policies around when to sunset models (after newer versions are
>>       added, for instance).
>>    - Model access controls management
>>    - The architecture proposes no constraints around this. As it stands
>>       now, models are held in HDFS, so it would inherit the same security
>>       capabilities from that (user/group permissions + Ranger, etc)
>>    - Requirements around concept drift
>>    - I'd love to hear user requirements around how we could automatically
>>       address concept drift. The architecture as it's proposed let's the user
>>       decide when to update models.
>>    - Requirements around model output
>>    - The architecture as it stands just mandates a JSON map input and JSON
>>       map output, so it's up to the model what they want to pass back.
>>       - It's also up to the model to document its own output.
>>    - Any model audit and logging requirements
>>    - The architecture proposes no constraints around this. I'd love to see
>>       community guidance around this. As it stands, we just log using the 
>> same
>>       mechanism as any YARN application.
>>    - What model metrics need to be exposed
>>    - The architecture proposes no constraints around this. I'd love to see
>>       community guidance around this.
>>       - Requirements around failure modes
>>    - We briefly touch on this in the document, but it is probably not
>>       complete. Service endpoint failure will result in blacklisting from a
>>       storm bolt perspective and node failure should result in a new 
>> container
>>       being started by the Yarn application master. Beyond that, the
>>       architecture isn't explicit.
>>
>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <[email protected]> wrote:
>>>
>>>  I left a comment on the JIRA. I think your design is promising. One
>>>  other thing I would suggest is for us to crowd source requirements around
>>>  model management. Specifically:
>>>
>>>  Model versioning and retention
>>>  Model access controls management
>>>  Requirements around concept drift
>>>  Requirements around model output
>>>  Any model audit and logging requirements
>>>  What model metrics need to be exposed
>>>  Requirements around failure modes
>>>
>>>  03.07.2016, 14:00, "Casey Stella" <[email protected]>:
>>>>  Hi all,
>>>>
>>>>  I think we are at the point where we should try to tackle Model as a
>>>>  service for Metron. As such, I created a JIRA and proposed an
>>>  architecture
>>>>  for accomplishing this within Metron.
>>>>
>>>>  My inclination is to be data science language/library agnostic and to
>>>>  provide a general purpose REST infrastructure for managing and serving
>>>>  models trained on historical data captured from Metron. The assumption is
>>>>  that we are within the hadoop ecosystem, so:
>>>>
>>>>    - Models stored on HDFS
>>>>    - REST Model Services resource-managed via Yarn
>>>>    - REST Model Services discovered via Zookeeper.
>>>>
>>>>  I would really appreciate community comment on the JIRA (
>>>>  https://issues.apache.org/jira/browse/METRON-265). The proposed
>>>>  architecture is attached as a document to that JIRA.
>>>>
>>>>  I look forward to feedback!
>>>>
>>>>  Best,
>>>>
>>>>  Casey
>>>
>>>  -------------------
>>>  Thank you,
>>>
>>>  James Sirota
>>>  PPMC- Apache Metron (Incubating)
>>>  jsirota AT apache DOT org

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: Metron-265 Model as a Service

Reply via email to