[
https://issues.apache.org/jira/browse/SOLR-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-7631:
---------------------------
Description:
Working through SOLR-7605, I've confirmed that the underlying problem exists
for regular {{field.facet}} situations, regardless of distrib mode, for Trie
fields that have a non-zero precisionStep. *this has only been reproduced when
the RandomCodec was in use*
The problem, when it manifests, is that faceting on a TrieIntField, using
{{facet.mincount=0}}, causes the facet results to include three instances of
facet the value "0" listed with a count of "0" -- even though no document in
the index contains this value at all...
{noformat}
[junit4] > <lst name="facet_fields">
[junit4] > <lst name="foo_ti">
[junit4] > <int name="20">32</int>
...
[junit4] > <int name="50">21</int>
[junit4] > <int name="0">0</int>
[junit4] > <int name="0">0</int>
[junit4] > <int name="0">0</int>
{noformat}
This is concerning for a few reasons:
* In the case of PivotFaceting, getting duplicate values back from a single
shard like this triggers an assert in distributed queries and the request fails
-- even if asserts aren't enabled, the bogus "0" value can be propogated to
clients if they ask for facet.pivot.mincount=0
* Client code expecting a single (value,count) pair for each value may equally
be confused/broken by this response where the same "value" is returned multiple
times
* w/o knowing the root cause, It seems very possible that other nonsense values
may be getting returned -- ie: if the error only happens with fields utilizing
precisionStep, then it's likely related to the synthetic values used for faster
range queries, and other synthetic values may be getting included with bogus
counts
A Patch with a simple test that can demonstrate the bug fairly easily will be
attached shortly
was:
Working through SOLR-7605, I've confirmed that the underlying problem exists
for regular {{field.facet}} situations, regardless of distrib mode, for Trie
fields that have a non-zero precisionStep -- there's still ome other missing
piece of the puzzle i haven't figured out, but it relates in some way to some
of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)
The problem, when it manifests, is that faceting on a TrieIntField, using
{{facet.mincount=0}}, causes the facet results to include three instances of
facet the value "0" listed with a count of "0" -- even though no document in
the index contains this value at all...
{noformat}
[junit4] > <lst name="facet_fields">
[junit4] > <lst name="foo_ti">
[junit4] > <int name="20">32</int>
...
[junit4] > <int name="50">21</int>
[junit4] > <int name="0">0</int>
[junit4] > <int name="0">0</int>
[junit4] > <int name="0">0</int>
{noformat}
This is concerning for a few reasons:
* In the case of PivotFaceting, getting duplicate values back from a single
shard like this triggers an assert in distributed queries and the request fails
-- even if asserts aren't enabled, the bogus "0" value can be propogated to
clients if they ask for facet.pivot.mincount=0
* Client code expecting a single (value,count) pair for each value may equally
be confused/broken by this response where the same "value" is returned multiple
times
* w/o knowing the root cause, It seems very possible that other nonsense values
may be getting returned -- ie: if the error only happens with fields utilizing
precisionStep, then it's likely related to the synthetic values used for faster
range queries, and other synthetic values may be getting included with bogus
counts
A Patch with a simple test that can demonstrate the bug fairly easily will be
attached shortly
Summary: RandomCodec can cause Faceting on multivalued Trie fields with
precisionStep != 0 can produce bogus value="0" in some test seeds (was:
Faceting on multivalued Trie fields with precisionStep != 0 can produce bogus
value="0" in some situations)
re-reading my long comment from last night, i realized i kind of buried the
lead, which is: I was not able to reproduce this bug using any explicitly
specified -Dtests.codec other then "random"
> RandomCodec can cause Faceting on multivalued Trie fields with precisionStep
> != 0 can produce bogus value="0" in some test seeds
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-7631
> URL: https://issues.apache.org/jira/browse/SOLR-7631
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Attachments: SOLR-7631_test.patch, SOLR-7631_test.patch, log.tgz
>
>
> Working through SOLR-7605, I've confirmed that the underlying problem exists
> for regular {{field.facet}} situations, regardless of distrib mode, for Trie
> fields that have a non-zero precisionStep. *this has only been reproduced
> when the RandomCodec was in use*
> The problem, when it manifests, is that faceting on a TrieIntField, using
> {{facet.mincount=0}}, causes the facet results to include three instances of
> facet the value "0" listed with a count of "0" -- even though no document in
> the index contains this value at all...
> {noformat}
> [junit4] > <lst name="facet_fields">
> [junit4] > <lst name="foo_ti">
> [junit4] > <int name="20">32</int>
> ...
> [junit4] > <int name="50">21</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> {noformat}
> This is concerning for a few reasons:
> * In the case of PivotFaceting, getting duplicate values back from a single
> shard like this triggers an assert in distributed queries and the request
> fails -- even if asserts aren't enabled, the bogus "0" value can be
> propogated to clients if they ask for facet.pivot.mincount=0
> * Client code expecting a single (value,count) pair for each value may
> equally be confused/broken by this response where the same "value" is
> returned multiple times
> * w/o knowing the root cause, It seems very possible that other nonsense
> values may be getting returned -- ie: if the error only happens with fields
> utilizing precisionStep, then it's likely related to the synthetic values
> used for faster range queries, and other synthetic values may be getting
> included with bogus counts
> A Patch with a simple test that can demonstrate the bug fairly easily will be
> attached shortly
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]