[
https://issues.apache.org/jira/browse/SOLR-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-7631:
---------------------------
Attachment: SOLR-7631_test.patch
Updated patch...
* tests some precisionStep=0 fields as well to demonstrate that they never
exhibit the failure
* tests all possible facet.method values to demonstrate that the multivalued
precisionStep=8 fields fail regardless of what method is requested
* fix the NUM_DOCS and MergePolicy used to reduce the number of variables
** NOTE: some observation indicated that low number of docs in the index was
less likely to fail -- suggesting that the bug is related to either num
segments, or segment size, or posting list size .. but with NUM_DOCS == 1000
there are still plenty of seeds that fail reliably.
With these changes, the only pattern i'm seeing is that all of the failures
seem to involve the RandomCodec -- which reports itself in the "test params"
output as...
bq. NOTE: test params are: codec=Asserting(Lucene50): { ... ranodmized posting
formats here ...}, docValues:{ ...randomized docValues here ...}, sim=etc,
locale=etc, timezone=etc
...but i haven't found any pattern in the PostingFormat reported for the field
in question (foo_ti) -- and spot checks using -Dtests.codec=AssertingCodec and
-Dtests.codec=Lucene50 codec directly haven't failed, leading me to believe it
must either be some other aspect of how RandomCodec does it's wrapping, or some
nuance in the PostingFormat selected.
I'm currently beasting this test using every possible -Dtests.codec option to
sanity check that it only ever fails with "random" ... once that's done, i
guess i'll start doing the same thing with -Dtests.postingformat unless anyone
spots the problem first.
> Faceting on multivalued Trie fields with precisionStep != 0 can produce bogus
> value="0" in some situations
> ----------------------------------------------------------------------------------------------------------
>
> Key: SOLR-7631
> URL: https://issues.apache.org/jira/browse/SOLR-7631
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Attachments: SOLR-7631_test.patch, SOLR-7631_test.patch, log.tgz
>
>
> Working through SOLR-7605, I've confirmed that the underlying problem exists
> for regular {{field.facet}} situations, regardless of distrib mode, for Trie
> fields that have a non-zero precisionStep -- there's still ome other missing
> piece of the puzzle i haven't figured out, but it relates in some way to some
> of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)
> The problem, when it manifests, is that faceting on a TrieIntField, using
> {{facet.mincount=0}}, causes the facet results to include three instances of
> facet the value "0" listed with a count of "0" -- even though no document in
> the index contains this value at all...
> {noformat}
> [junit4] > <lst name="facet_fields">
> [junit4] > <lst name="foo_ti">
> [junit4] > <int name="20">32</int>
> ...
> [junit4] > <int name="50">21</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> {noformat}
> This is concerning for a few reasons:
> * In the case of PivotFaceting, getting duplicate values back from a single
> shard like this triggers an assert in distributed queries and the request
> fails -- even if asserts aren't enabled, the bogus "0" value can be
> propogated to clients if they ask for facet.pivot.mincount=0
> * Client code expecting a single (value,count) pair for each value may
> equally be confused/broken by this response where the same "value" is
> returned multiple times
> * w/o knowing the root cause, It seems very possible that other nonsense
> values may be getting returned -- ie: if the error only happens with fields
> utilizing precisionStep, then it's likely related to the synthetic values
> used for faster range queries, and other synthetic values may be getting
> included with bogus counts
> A Patch with a simple test that can demonstrate the bug fairly easily will be
> attached shortly
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]