Hi can we use slack and start the discussion there? Thoughts? Regards. El mar., 18 feb. 2020 a las 5:37, Tadd Wood (<[email protected]>) escribió:
> I always felt the ODM was rushed in its early design, and to your earlier > point, it's not surprising that the ODM's flat structure was driven > strongly by the desire to have Impala or other SQL interpreters as a > front-end to the data. Although I can appreciate the approachability of SQL > and understand the desire for that to drive the data model design, many > use-cases that were being executed in reality required complex join-logic > or nested subqueries that didn't scale very well. > > Restructuring the design around nouns would make it much easier to pivot > when trying to express complex and evolving queries. It would be crucial to > also allow these nouns to be created with more complex/nested structures, > providing space for further enrichment of events during ingestion or later > on. > > I'm excited to be moving the ODM conversation forward, and hope others > chime in as well as we start to work through the planning process. > > Thank you, > Tadd Wood > > On Fri, Feb 14, 2020 at 5:35 PM Austin Leahy <[email protected]> wrote: > > > I want to start a discussion about the current state of the ODM. I think > > that because of different changes that were in progress at the time we > > started truly working on it and different miscommunications that the idea > > kind of drifted off and we ended up with something that works but isn't > > something fundamentally scalable and workable for operational security. > I'm > > starting this thread to put forward my best understanding of my own > > concerns about this to facilitate a conversation. > > > > 1. Columnar implementation lacking true columnar architecture: > > > > Most of the attempts to operationalize spot in the early days ended up > > leveraging impala and parquet. Because of the ease of table creation and > > SQL approachability this seemed appealing but it injected drift into the > > ODM. Part of the desire to create the ODM was a desire to formalize nouns > > to represent fields in a single store so that "ip" would mean the same > > thing wherever you saw it. Because of the use of SQL this ultimately lead > > to a slow death of that idea and we ended up with fields like "alert_ip". > > > > In my head I can hear some of you asking "ok well as long as the model is > > formalized why does this matter?" > > > > The reason is because searches at scale would require scanning multiple > > fields to produce complete answers for example a desire to query "What > are > > all of the IP addresses that have communicated with [email protected]?" > > would need to stitch together one or more queries that possibly join > > multiple tables and need to consider multiple fields. The benefit of a > > truly columnar architecture is to simple request the single field from > the > > primary operational source and let it loose. > > > > > > 2. Modeling considers sources but not enrichment and objects: > > > > To me one of the dream benefits of using apache big data tools to do > > security is the ability to constantly crawl data and enrich it with new > > data that lands. In 2015 when I started participating in this project I > had > > a hard time articulating why I felt that enrichment would be so valuable > > but having participated in various security projects that used kafka > queues > > to enrich and update other data in the last 2 years I have a pretty clear > > explanation. > > > > The ability to understand how different sources fit together has always > > been a crucial skill for security operators but the reality is that this > > has only been the case because inline enrichment had a computational and > > storage expense that made it illogical. Today the ever dropping cost of > > storage and the ever improving performance of tools like Spark make this > > skill unnecessary because we can automated tasks like joining current > user > > of a machine into a row as that data becomes available. > > > > > > 3. The ODM was supposed to make setting up your operational store > turnkey: > > > > Documenting the ODM has certainly made using Spot easier but I always > hoped > > it would make it idiot or more precisely me proof. Currently the ODM is a > > guide more than it is a model. Originally we hoped that the ODM would > turn > > into code as configuration Nouns defined in JSON similar to the way that > > Solr approaches field definitions. > > > > "name":"ip", > > > > "display": > > > > "title": "Host Name", > > > > "min_len"": "8", > > > > "type":"string" > > > > > > "name":"src", > > > > "display": > > > > "title": "Source IP", > > > > "min_len"": "8", > > > > "type":"ip" > > > > > > "name":"dst", > > > > "display": > > > > "title": "Destination IP", > > > > "min_len"": "8", > > > > "type":"ip" > > > > These field could then be built into sources > > > > "device": > > > > "manufacturer": "cisco" > > > > "model": 1354684 > > > > "messages": > > > > "title": "alert" > > > > "information": > > > > "nouns": > > > > "host": > > > > "stored":true, > > > > "required":"yes" > > > > "extract":"some regex " > > > > "title": "inform" > > > > "information": > > > > "nouns": > > > > "host": > > > > "stored":true, > > > > "required":"yes" > > > > "extract":"some regex" > > > > The desire to build these configurations as part of the repositories > would > > facilitate an institutional memory around source ingest as well as an > > ability to clearly articulate what various fields are for some forward > > looking UI updates. > > > > We are going to create an epic and branch for this but I wanted to open > up > > discussions here. > > > > Thanks Austin > > > > PS its been great and exciting to see certain people become active in the > > project keep it up we still believe in this. > > >
