There is no support for pagination for terms aggregations.
The official reason seems to be that it is "tricky to implement"; see issue 
#4915 <https://github.com/elasticsearch/elasticsearch/issues/4915> which is 
now unfortunately closed.

So getting paginated terms ordered by count does not seem possible at this 
point.
You could, however, order them alphabetically (by term), and apply filtering 
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_filtering_values>
 in 
a clever way to retrieve sequences of terms.
As you point out, a cardinality 
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-metrics-cardinality-aggregation.html>
 query 
beforehand could inform your paging strategy.

Algorithm assuming A-Z letters for a well distributed collection of terms:
- determine cardinality based on the first character (26 buckets)
- if the size of a bucket exceeds a certain limit, repeat with the second 
character for that bucket (26 sub buckets)
- the prefix of the term (1 or more letters) then becomes your paging 
mechanism

How this translates in performance, I have no idea.
It will save on transfers from ES for sure, but it might not perform as 
well as simply fetching every term and doing the paging in the server 
application layer.

Personally, I would love to see pagination support in Elasticsearch, even 
if there is a performance penalty.
It seems much better than risking flooding a naive client or server with 
too many terms at once.

On Thursday, September 11, 2014 2:48:30 PM UTC-4, jigish thakar wrote:
>
> Hey Guys,
> I am building some Logging and monitoring product for my employer and 
> using ES as backend.
> now finding unique value of each/any attribute is core part of business 
> logic I have in hand.
>
> lets say I want unique dst_ip, to achieve that,
> - I have used "index":"not_analyzed" for selected fields
> - Api used to get unique count 
>    http://127.0.0.1:9200/es-server/Events/_search -d 
> '{"aggs":{"dst_ip_count":{"cardinality":{"field":"dst_ip"}}},"size":0}'
> - Api used to fetch those values
>    http://127.0.0.1:9200/es-server/Events/_search -d 
> '{"fields":["dst_ip"],"facets":{"terms":{"terms":{"field":"dst_ip","size":1116,"order":"count"}}},"size":1116}'
>
>   here 1116 is received from first API. now here the count is very small 
> but in production environment this count goes greater then 2lakh. which 
> results in slow query response.
>
> do we have any other way to fetch such values with pagination inbuild like 
> we have in search query with size and from.
>
> Please suggest, thanks in advance.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4e93af8-27b1-45a8-b650-ee2311c83066%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to