[ 
https://issues.apache.org/jira/browse/SOLR-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-7631:
---------------------------
    Attachment: SOLR-7631_test.patch

Updated patch...

* tests some precisionStep=0 fields as well to demonstrate that they never 
exhibit the failure
* tests all possible facet.method values to demonstrate that the multivalued 
precisionStep=8 fields fail regardless of what method is requested
* fix the NUM_DOCS and MergePolicy used to reduce the number of variables
** NOTE: some observation indicated that low number of docs in the index was 
less likely to fail -- suggesting that the bug is related to either num 
segments, or segment size, or posting list size .. but with NUM_DOCS == 1000 
there are still plenty of seeds that fail reliably.

With these changes, the only pattern i'm seeing is that all of the failures 
seem to involve the RandomCodec -- which reports itself in the "test params" 
output as...

bq. NOTE: test params are: codec=Asserting(Lucene50): { ... ranodmized posting 
formats here ...}, docValues:{ ...randomized docValues here ...}, sim=etc, 
locale=etc, timezone=etc

...but i haven't found any pattern in the PostingFormat reported for the field 
in question (foo_ti) -- and spot checks using -Dtests.codec=AssertingCodec and 
-Dtests.codec=Lucene50 codec directly haven't failed, leading me to believe it 
must either be some other aspect of how RandomCodec does it's wrapping, or some 
nuance in the PostingFormat selected.

I'm currently beasting this test using every possible -Dtests.codec option to 
sanity check that it only ever fails with "random" ... once that's done, i 
guess i'll start doing the same thing with -Dtests.postingformat unless anyone 
spots the problem first. 


> Faceting on multivalued Trie fields with precisionStep != 0 can produce bogus 
> value="0" in some situations
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7631
>                 URL: https://issues.apache.org/jira/browse/SOLR-7631
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: SOLR-7631_test.patch, SOLR-7631_test.patch, log.tgz
>
>
> Working through SOLR-7605, I've confirmed that the underlying problem exists 
> for regular {{field.facet}} situations, regardless of distrib mode, for Trie 
> fields that have a non-zero precisionStep -- there's still ome other missing 
> piece of the puzzle i haven't figured out, but it relates in some way to some 
> of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)
> The problem, when it manifests, is that faceting on a TrieIntField, using 
> {{facet.mincount=0}}, causes the facet results to include three instances of 
> facet the value "0" listed with a count of "0" -- even though no document in 
> the index contains this value at all...
> {noformat}
>    [junit4]    >   <lst name="facet_fields">
>    [junit4]    >     <lst name="foo_ti">
>    [junit4]    >       <int name="20">32</int>
> ...
>    [junit4]    >       <int name="50">21</int>
>    [junit4]    >       <int name="0">0</int>
>    [junit4]    >       <int name="0">0</int>
>    [junit4]    >       <int name="0">0</int>
> {noformat}
> This is concerning for a few reasons:
> * In the case of PivotFaceting, getting duplicate values back from a single 
> shard like this triggers an assert in distributed queries and the request 
> fails -- even if asserts aren't enabled, the bogus "0" value can be 
> propogated to clients if they ask for facet.pivot.mincount=0
> * Client code expecting a single (value,count) pair for each value may 
> equally be confused/broken by this response where the same "value" is 
> returned multiple times
> * w/o knowing the root cause, It seems very possible that other nonsense 
> values may be getting returned -- ie: if the error only happens with fields 
> utilizing precisionStep, then it's likely related to the synthetic values 
> used for faster range queries, and other synthetic values may be getting 
> included with bogus counts
> A Patch with a simple test that can demonstrate the bug fairly easily will be 
> attached shortly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to