Hoss Man created SOLR-10918:
-------------------------------

             Summary: StatsComponent cardinality descrepencies between regular 
vs pre-hashed values whe using PointsField
                 Key: SOLR-10918
                 URL: https://issues.apache.org/jira/browse/SOLR-10918
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


discovered as part of SOLR-10807...

when using Points based numerics, the HLL estimates using the raw values vs the 
hashed values disagree slightly -- this suggests some possible bug (or the very 
least: room for optimization) when using Points fields.

Example from SOLR-10807 when swaping IntPointField in place of TrieIntField...

{code}

   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestDistributedStatsComponentCardinality -Dtests.method=test 
-Dtests.seed=63854996088ED7B7 -Dtests.slow=true -Dtests.locale=de-GR 
-Dtests.timezone=Etc/UCT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 13.3s J2 | TestDistributedStatsComponentCardinality.test <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: int_i: hashed vs 
prehashed, real=7260, 
p=q=id:[1186+TO+8445]&rows=0&stats=true&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}int_i&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}int_i_prehashed_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}long_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}long_l_prehashed_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}string_s&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}string_s_prehashed_l
 expected:<6632> but was:<7929>
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([63854996088ED7B7:EBD1764CA672BA4F]:0)
   [junit4]    >        at 
org.apache.solr.handler.component.TestDistributedStatsComponentCardinality.test(TestDistributedStatsComponentCardinality.java:149)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to