Kevin Van Lieshout created SOLR-17122:
-----------------------------------------

             Summary: Solr Unique, HLL, numBuckets returns incorrect counts
                 Key: SOLR-17122
                 URL: https://issues.apache.org/jira/browse/SOLR-17122
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: 9.4
            Reporter: Kevin Van Lieshout


We are running Solr 9.4. My colleague and I are using the unique function and 
hll function in order to obtain a count of ip addresses from a text general 
field so we know how many ip's there are in a query, etc. Same goes with us 
testing on a pdouble field. We are seeing incorrect counts for three different 
methods in obtaining this value when there are a lot of unique values and a lot 
of documents. These methods are using:

 
 # unique(field)
 # hll(field)
 # Setting numBuckets: True

 
{'query': 'field:value', 'filter': [], 'facet': \{'first_facet': 
{unique(field_a)}}

or

or \{'query': 'field:value', 'filter': [], 'facet': {'first_facet': 
{hll(field_a)}}

or 

{'query': 'field:value', 'filter': [], 'facet': \{'first_facet': {'type': 
'terms', 'field': 'field_a', 'limit': 0, 'numBuckets': True}}}

 

I am wondering if this is the correct functionality and it being a limit of 
solr's ability but it's not great for us to show approximations in our 
application and we are wondering if we are using this incorrectly or this is 
what to expect at high cardinality and a large number of documents. Thank you 
much. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to