Hi can we use slack and start the discussion there?
Thoughts?
Regards.

El mar., 18 feb. 2020 a las 5:37, Tadd Wood (<[email protected]>)
escribió:

> I always felt the ODM was rushed in its early design, and to your earlier
> point, it's not surprising that the ODM's flat structure was driven
> strongly by the desire to have Impala or other SQL interpreters as a
> front-end to the data. Although I can appreciate the approachability of SQL
> and understand the desire for that to drive the data model design, many
> use-cases that were being executed in reality required complex join-logic
> or nested subqueries that didn't scale very well.
>
> Restructuring the design around nouns would make it much easier to pivot
> when trying to express complex and evolving queries. It would be crucial to
> also allow these nouns to be created with more complex/nested structures,
> providing space for further enrichment of events during ingestion or later
> on.
>
> I'm excited to be moving the ODM conversation forward, and hope others
> chime in as well as we start to work through the planning process.
>
> Thank you,
> Tadd Wood
>
> On Fri, Feb 14, 2020 at 5:35 PM Austin Leahy <[email protected]> wrote:
>
> > I want to start a discussion about the current state of the ODM. I think
> > that because of different changes that were in progress at the time we
> > started truly working on it and different miscommunications that the idea
> > kind of drifted off and we ended up with something that works but isn't
> > something fundamentally scalable and workable for operational security.
> I'm
> > starting this thread to put forward my best understanding of my own
> > concerns about this to facilitate a conversation.
> >
> > 1. Columnar implementation lacking true columnar architecture:
> >
> > Most of the attempts to operationalize spot in the early days ended up
> > leveraging impala and parquet. Because of the ease of table creation and
> > SQL approachability this seemed appealing but it injected drift into the
> > ODM. Part of the desire to create the ODM was a desire to formalize nouns
> > to represent fields in a single store so that "ip" would mean the same
> > thing wherever you saw it. Because of the use of SQL this ultimately lead
> > to a slow death of that idea and we ended up with fields like "alert_ip".
> >
> > In my head I can hear some of you asking "ok well as long as the model is
> > formalized why does this matter?"
> >
> > The reason is because searches at scale would require scanning multiple
> > fields to produce complete answers for example a desire to query "What
> are
> > all of the IP addresses that have communicated with [email protected]?"
> > would need to stitch together one or more queries that possibly join
> > multiple tables and need to consider multiple fields. The benefit of a
> > truly columnar architecture is to simple request the single field from
> the
> > primary operational source and let it loose.
> >
> >
> > 2. Modeling considers sources but not enrichment and objects:
> >
> > To me one of the dream benefits of using apache big data tools to do
> > security is the ability to constantly crawl data and enrich it with new
> > data that lands. In 2015 when I started participating in this project I
> had
> > a hard time articulating why I felt that enrichment would be so valuable
> > but having participated in various security projects that used kafka
> queues
> > to enrich and update other data in the last 2 years I have a pretty clear
> > explanation.
> >
> > The ability to understand how different sources fit together has always
> > been a crucial skill for security operators but the reality is that this
> > has only been the case because inline enrichment had a computational and
> > storage expense that made it illogical. Today the ever dropping cost of
> > storage and the ever improving performance of tools like Spark make this
> > skill unnecessary because we can automated tasks like joining current
> user
> > of a machine into a row as that data becomes available.
> >
> >
> > 3. The ODM was supposed to make setting up your operational store
> turnkey:
> >
> > Documenting the ODM has certainly made using Spot easier but I always
> hoped
> > it would make it idiot or more precisely me proof. Currently the ODM is a
> > guide more than it is a model. Originally we hoped that the ODM would
> turn
> > into code as configuration Nouns defined in JSON similar to the way that
> > Solr approaches field definitions.
> >
> >  "name":"ip",
> >
> > "display":
> >
> > "title": "Host Name",
> >
> > "min_len"": "8",
> >
> >  "type":"string"
> >
> >
> >  "name":"src",
> >
> > "display":
> >
> > "title": "Source IP",
> >
> > "min_len"": "8",
> >
> > "type":"ip"
> >
> >
> >  "name":"dst",
> >
> > "display":
> >
> > "title": "Destination IP",
> >
> > "min_len"": "8",
> >
> > "type":"ip"
> >
> > These field could then be built into sources
> >
> > "device":
> >
> > "manufacturer": "cisco"
> >
> > "model": 1354684
> >
> > "messages":
> >
> > "title": "alert"
> >
> > "information":
> >
> > "nouns":
> >
> > "host":
> >
> >  "stored":true,
> >
> > "required":"yes"
> >
> > "extract":"some regex "
> >
> > "title": "inform"
> >
> > "information":
> >
> > "nouns":
> >
> > "host":
> >
> > "stored":true,
> >
> > "required":"yes"
> >
> > "extract":"some regex"
> >
> > The desire to build these configurations as part of the repositories
> would
> > facilitate an institutional memory around source ingest as well as an
> > ability to clearly articulate what various fields are for some forward
> > looking UI updates.
> >
> > We are going to create an epic and branch for this but I wanted to open
> up
> > discussions here.
> >
> > Thanks Austin
> >
> > PS its been great and exciting to see certain people become active in the
> > project keep it up we still believe in this.
> >
>

Reply via email to