Hoss Man created SOLR-7631:
------------------------------

             Summary: Faceting on multivalued Trie fields with precisionStep != 
0 can produce bogus value="0" in some situations
                 Key: SOLR-7631
                 URL: https://issues.apache.org/jira/browse/SOLR-7631
             Project: Solr
          Issue Type: Bug
            Reporter: Hoss Man


Working through SOLR-7605, I've confirmed that the underlying problem exists 
for regular {{field.facet}} situations, regardless of distrib mode, for Trie 
fields that have a non-zero precisionStep -- there's still ome other missing 
piece of the puzzle i haven't figured out, but it relates in some way to some 
of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)

The problem, when it manifests, is that faceting on a TrieIntField, using 
{{facet.mincount=0}}, causes the facet results to include three instances of 
facet the value "0" listed with a count of "0" -- even though no document in 
the index contains this value at all...

{noformat}
   [junit4]    >   <lst name="facet_fields">
   [junit4]    >     <lst name="foo_ti">
   [junit4]    >       <int name="20">32</int>
...
   [junit4]    >       <int name="50">21</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
{noformat}

This is concerning for a few reasons:

* In the case of PivotFaceting, getting duplicate values back from a single 
shard like this triggers an assert in distributed queries and the request fails 
-- even if asserts aren't enabled, the bogus "0" value can be propogated to 
clients if they ask for facet.pivot.mincount=0
* Client code expecting a single (value,count) pair for each value may equally 
be confused/broken by this response where the same "value" is returned multiple 
times
* w/o knowing the root cause, It seems very possible that other nonsense values 
may be getting returned -- ie: if the error only happens with fields utilizing 
precisionStep, then it's likely related to the synthetic values used for faster 
range queries, and other synthetic values may be getting included with bogus 
counts

A Patch with a simple test that can demonstrate the bug fairly easily will be 
attached shortly




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to