UI pivotting / aggregation backend

Justin Leet Thu, 06 Jul 2017 06:12:19 -0700

I wanted to bring up a some stuff on the backend of our UI, and get
thoughts (+ things I overlooked, etc.).  There's also a couple points at
the end that merit discussion about how we handle things, since it gets
into how we handle our ES templates (since we generally want to aggregate
on raw fields, not analyzed ones).

To set the use case a bit, when we're looking through alerts in the UI,
we're going to want to be able to start pivoting and grouping in the UI.

For example, given a list of alerts, we may want to follow a ordering of
groupings like so:

All Alerts
--> Bucketed by User
----> Then further by Destination IP
------> Then further by Severity

The stuff I expect we'll want to be able to do:
* Pivot through multiple layers (as in the example above).
* Get counts within each bucket (Do we have a lot of high severity alerts?
Mostly medium? etc?)
* Get a subset of fields (I assume we don't want every entire doc that
comes back in the bucket)
* Pagination (if I have > X docs, show me X and let me retrieve more as
needed)
* Sorting within a bucket (I may want to sort by time, by userid, etc.)
* Filtering (Be able to do this stuff while only showing high severity
alerts)

In terms of actually implementing this, to the best of my limited knowledge
(and playing around with ES looking into this), this seems like pretty
doable stuff, out of the box. See:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html

There are two main pain points I see in this:
* Actually constructing these queries. I don't know that we've explicitly
said we want a layer of abstraction between the UI and the real time store,
but I strongly suggest we have one. Theoretically, we should be able to
support (at least) Solr and ES in the UI, not just one. Unfortunately,
since they aren't the same syntax, this means we have two impls, and I'd
personally like to see an abstraction that delegates appropriately.

* Aggregations in ES function post analysis. This means that we'll
typically want the raw field value to be able to aggregated on. In ES
implementation, this means a "not_analyzed" field. Glancing (incredibly)
briefly through our templates, we do have some string values that are
analyzed (and I have no idea if they're generally relevant to this UI or
not, I just didn't look). I'm also assuming Stellar enrichments are
analyzed right now. I'm also unsure what happens to metadata (
https://github.com/apache/metron/pull/621) Essentially the question is:
"How do we handle this, particularly since we're a pretty dynamic system?"

UI pivotting / aggregation backend

Reply via email to