Hi,

Aggregations ROCK!

But.... 
Imagine, we have a document with a number (n) in the range of -10 to 10. 
And a field with a couple of terms.

Now I want to create a view with two columns, the first for the number of 
documents having a positive value, the second for documents having a 
negative value. This is easily done with an aggregation like this:

{
  "size": 0,
  "aggs": {
    "label": {
      "terms": {
        "field": "somefield"
      },
      "aggs": {
        "positive": {
          "filter": {
            "range": {
              "n": { "gt": 0 }
            }
          }
        },
        "negative": {
          "filter": {
            "range": {
              "n": { "lt": 0 }
            }
          }
        }
      }
    }
  }
}


The counts of the positive and negative documents can be found in 
'positive.doc_count' and the 'negative.doc_count'. Everything is fine.

Now you want to sort you aggregation on the label with the most positive 
documents, so we add the following to the 'terms':
...
{
  "terms": {
    "field": "somefield",
    "order": { "positive": "desc" }
  },
  ...
}
...


We get an error back saying:
SearchPhaseExecutionException[Failed to execute phase [query], all shards 
failed; shardFailures {[RwghWCxzQ-S9SyjTAX119A][XXXXXXXXXXXXX][8]: 
AggregationExecutionException[terms aggregation [label] is configured to 
order by sub-aggregation [positive] which is is not a metrics aggregation. 
Terms aggregation order can only refer to metrics aggregations]}

This error tells me that I can't order on the doc_count of a filter because 
filter is not a metrics aggregation. It would be really help full to be 
able to sort on filtered aggregations as well. I would even go as far as 
sorting on sub aggregations of filters! That way you could sum the value of 
the positive documents and sort on the sum as well (although I would not 
know what that number is supposed to represent now).

This might not be trivial to implement, but I think it is worth looking in 
to.

At least I found a way to get around this problem in the short term by 
summing the value of a script which does the heavy lifting in checking if 
the document should be counted, but we have some configuration files now 
where we have 100+ columns representing counts of documents with according 
filters which you cannot really port automatically to a value_script of a 
sum.

If people are interesting in the workaround this is the query:
{
  "size": 0,
  "aggs": {
    "label": {
      "terms": {
        "field": "somefield",
        "order": { "positive": "desc" }
      },
      "aggs": {
        "positive": {
          "sum": {
            "field": "n",
            "script": "_value>0?1:0"
          }
        },
        "negative": {
          "sum": {
            "field": "n",
            "script": "_value<0?1:0"
          }
        }
      }
    }
  }
}


Could we get the sorting on filtered aggregations as a feature request? Or 
is this simply impossible to achieve (with decent performance) in the 
aggregations framework?

-- Nils

On a side note, the above queries are not tested on elasticsearch so could 
contain errors when copy/paste into an elasticsearch request.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/971d14e9-e9e7-45f3-bc54-5c5c2c3603fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to