Given the fact that some people prefer Solr and some of them Elasticsearch, having an abstraction layer for Solr and Elasticsearch would be really great. However, I haven't seen any framework out there that can provide the required level of search abstraction on top of Solr and Elasticsearch, but I guess there should be one. Something like Apache Calcite but more specific to search queries. Without that there is too much implementation.
On Fri, Jul 7, 2017 at 6:48 PM, Casey Stella <[email protected]> wrote: > I just want to chime in and support the notion of an abstraction layer > between the UI and the indexed stores. I think that having an API that > people can conform to is going to be important as people want to plug in > their own backing indices in the future. > > Casey > > On Thu, Jul 6, 2017 at 2:11 PM, Justin Leet <[email protected]> wrote: > > > I wanted to bring up a some stuff on the backend of our UI, and get > > thoughts (+ things I overlooked, etc.). There's also a couple points at > > the end that merit discussion about how we handle things, since it gets > > into how we handle our ES templates (since we generally want to aggregate > > on raw fields, not analyzed ones). > > > > To set the use case a bit, when we're looking through alerts in the UI, > > we're going to want to be able to start pivoting and grouping in the UI. > > > > For example, given a list of alerts, we may want to follow a ordering of > > groupings like so: > > > > All Alerts > > --> Bucketed by User > > ----> Then further by Destination IP > > ------> Then further by Severity > > > > The stuff I expect we'll want to be able to do: > > * Pivot through multiple layers (as in the example above). > > * Get counts within each bucket (Do we have a lot of high severity > alerts? > > Mostly medium? etc?) > > * Get a subset of fields (I assume we don't want every entire doc that > > comes back in the bucket) > > * Pagination (if I have > X docs, show me X and let me retrieve more as > > needed) > > * Sorting within a bucket (I may want to sort by time, by userid, etc.) > > * Filtering (Be able to do this stuff while only showing high severity > > alerts) > > > > In terms of actually implementing this, to the best of my limited > knowledge > > (and playing around with ES looking into this), this seems like pretty > > doable stuff, out of the box. See: > > https://www.elastic.co/guide/en/elasticsearch/reference/2. > > 4/search-aggregations-bucket-terms-aggregation.html > > > > There are two main pain points I see in this: > > * Actually constructing these queries. I don't know that we've > explicitly > > said we want a layer of abstraction between the UI and the real time > store, > > but I strongly suggest we have one. Theoretically, we should be able to > > support (at least) Solr and ES in the UI, not just one. Unfortunately, > > since they aren't the same syntax, this means we have two impls, and I'd > > personally like to see an abstraction that delegates appropriately. > > > > * Aggregations in ES function post analysis. This means that we'll > > typically want the raw field value to be able to aggregated on. In ES > > implementation, this means a "not_analyzed" field. Glancing (incredibly) > > briefly through our templates, we do have some string values that are > > analyzed (and I have no idea if they're generally relevant to this UI or > > not, I just didn't look). I'm also assuming Stellar enrichments are > > analyzed right now. I'm also unsure what happens to metadata ( > > https://github.com/apache/metron/pull/621) Essentially the question is: > > "How do we handle this, particularly since we're a pretty dynamic > system?" > > > -- A.Nazemian
