Hi Dror, On Tue, Nov 25, 2014 at 2:29 PM, Dror Atariah <[email protected]> wrote:
> Hi Adrien, > > I have two comments/questions: > > 1) For me, the documentation is still somehow confusing, and the > difference between the *cardinality* and *value_count* aggregations is > not 100% clear. > I have to agree here... If you have suggestions to make it less confusing, ideas are highly welcome (even changing the name of the aggs might be an option if we do it in a major release). > 2) When it comes to counting unique values: I believe that the only way > that one can take, at the moment, is to use the *cardinality* aggregation. > This, however, comes with the price of an approximated result (as discussed > in the documentation and in the paper describing HyperLogLog++). I > understand the need to take an approximating approach; but I think that the > returned result should indicate a bound on the error. Otherwise, the > returned count could be considered useless. In the documentation the figure > 5% is mentioned --- is it independent of the cardinality? what happens to > this bound when the precision threshold is >> 40,000? > This is true, only the cardinality aggregation allows to compute unique counts. The thing about the error is that there is no bound on it, but higher errors are less likely. The only thing we *might* be able to return would be a condifence interval, but it requires some work... Regarding the 5% that are mentioned in the documentation, it was just meant as an example to show that in spite of the approximate approach, results are very close to accurate. A precision_threshold above 40000 is basically the same as a precision_threshold of 40000. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6-s1dM%2BuYpLDTn_tFfpxevYZmu_3_zvaRiXKwuZi2vOw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
