Kevin Van Lieshout created SOLR-17122:
-----------------------------------------
Summary: Solr Unique, HLL, numBuckets returns incorrect counts
Key: SOLR-17122
URL: https://issues.apache.org/jira/browse/SOLR-17122
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 9.4
Reporter: Kevin Van Lieshout
We are running Solr 9.4. My colleague and I are using the unique function and
hll function in order to obtain a count of ip addresses from a text general
field so we know how many ip's there are in a query, etc. Same goes with us
testing on a pdouble field. We are seeing incorrect counts for three different
methods in obtaining this value when there are a lot of unique values and a lot
of documents. These methods are using:
# unique(field)
# hll(field)
# Setting numBuckets: True
{'query': 'field:value', 'filter': [], 'facet': \{'first_facet':
{unique(field_a)}}
or
or \{'query': 'field:value', 'filter': [], 'facet': {'first_facet':
{hll(field_a)}}
or
{'query': 'field:value', 'filter': [], 'facet': \{'first_facet': {'type':
'terms', 'field': 'field_a', 'limit': 0, 'numBuckets': True}}}
I am wondering if this is the correct functionality and it being a limit of
solr's ability but it's not great for us to show approximations in our
application and we are wondering if we are using this incorrectly or this is
what to expect at high cardinality and a large number of documents. Thank you
much.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]