I wanted to bring up a some stuff on the backend of our UI, and get thoughts (+ things I overlooked, etc.). There's also a couple points at the end that merit discussion about how we handle things, since it gets into how we handle our ES templates (since we generally want to aggregate on raw fields, not analyzed ones).
To set the use case a bit, when we're looking through alerts in the UI, we're going to want to be able to start pivoting and grouping in the UI. For example, given a list of alerts, we may want to follow a ordering of groupings like so: All Alerts --> Bucketed by User ----> Then further by Destination IP ------> Then further by Severity The stuff I expect we'll want to be able to do: * Pivot through multiple layers (as in the example above). * Get counts within each bucket (Do we have a lot of high severity alerts? Mostly medium? etc?) * Get a subset of fields (I assume we don't want every entire doc that comes back in the bucket) * Pagination (if I have > X docs, show me X and let me retrieve more as needed) * Sorting within a bucket (I may want to sort by time, by userid, etc.) * Filtering (Be able to do this stuff while only showing high severity alerts) In terms of actually implementing this, to the best of my limited knowledge (and playing around with ES looking into this), this seems like pretty doable stuff, out of the box. See: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html There are two main pain points I see in this: * Actually constructing these queries. I don't know that we've explicitly said we want a layer of abstraction between the UI and the real time store, but I strongly suggest we have one. Theoretically, we should be able to support (at least) Solr and ES in the UI, not just one. Unfortunately, since they aren't the same syntax, this means we have two impls, and I'd personally like to see an abstraction that delegates appropriately. * Aggregations in ES function post analysis. This means that we'll typically want the raw field value to be able to aggregated on. In ES implementation, this means a "not_analyzed" field. Glancing (incredibly) briefly through our templates, we do have some string values that are analyzed (and I have no idea if they're generally relevant to this UI or not, I just didn't look). I'm also assuming Stellar enrichments are analyzed right now. I'm also unsure what happens to metadata ( https://github.com/apache/metron/pull/621) Essentially the question is: "How do we handle this, particularly since we're a pretty dynamic system?"
