I want to start a discussion about the current state of the ODM. I think
that because of different changes that were in progress at the time we
started truly working on it and different miscommunications that the idea
kind of drifted off and we ended up with something that works but isn't
something fundamentally scalable and workable for operational security. I'm
starting this thread to put forward my best understanding of my own
concerns about this to facilitate a conversation.

1. Columnar implementation lacking true columnar architecture:

Most of the attempts to operationalize spot in the early days ended up
leveraging impala and parquet. Because of the ease of table creation and
SQL approachability this seemed appealing but it injected drift into the
ODM. Part of the desire to create the ODM was a desire to formalize nouns
to represent fields in a single store so that "ip" would mean the same
thing wherever you saw it. Because of the use of SQL this ultimately lead
to a slow death of that idea and we ended up with fields like "alert_ip".

In my head I can hear some of you asking "ok well as long as the model is
formalized why does this matter?"

The reason is because searches at scale would require scanning multiple
fields to produce complete answers for example a desire to query "What are
all of the IP addresses that have communicated with le...@apache.org?"
would need to stitch together one or more queries that possibly join
multiple tables and need to consider multiple fields. The benefit of a
truly columnar architecture is to simple request the single field from the
primary operational source and let it loose.


2. Modeling considers sources but not enrichment and objects:

To me one of the dream benefits of using apache big data tools to do
security is the ability to constantly crawl data and enrich it with new
data that lands. In 2015 when I started participating in this project I had
a hard time articulating why I felt that enrichment would be so valuable
but having participated in various security projects that used kafka queues
to enrich and update other data in the last 2 years I have a pretty clear
explanation.

The ability to understand how different sources fit together has always
been a crucial skill for security operators but the reality is that this
has only been the case because inline enrichment had a computational and
storage expense that made it illogical. Today the ever dropping cost of
storage and the ever improving performance of tools like Spark make this
skill unnecessary because we can automated tasks like joining current user
of a machine into a row as that data becomes available.


3. The ODM was supposed to make setting up your operational store turnkey:

Documenting the ODM has certainly made using Spot easier but I always hoped
it would make it idiot or more precisely me proof. Currently the ODM is a
guide more than it is a model. Originally we hoped that the ODM would turn
into code as configuration Nouns defined in JSON similar to the way that
Solr approaches field definitions.

 "name":"ip",

"display":

"title": "Host Name",

"min_len"": "8",

 "type":"string"


 "name":"src",

"display":

"title": "Source IP",

"min_len"": "8",

"type":"ip"


 "name":"dst",

"display":

"title": "Destination IP",

"min_len"": "8",

"type":"ip"

These field could then be built into sources

"device":

"manufacturer": "cisco"

"model": 1354684

"messages":

"title": "alert"

"information":

"nouns":

"host":

 "stored":true,

"required":"yes"

"extract":"some regex "

"title": "inform"

"information":

"nouns":

"host":

"stored":true,

"required":"yes"

"extract":"some regex"

The desire to build these configurations as part of the repositories would
facilitate an institutional memory around source ingest as well as an
ability to clearly articulate what various fields are for some forward
looking UI updates.

We are going to create an epic and branch for this but I wanted to open up
discussions here.

Thanks Austin

PS its been great and exciting to see certain people become active in the
project keep it up we still believe in this.

Reply via email to