[
https://issues.apache.org/jira/browse/SOLR-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574939#comment-14574939
]
Hoss Man commented on SOLR-7631:
--------------------------------
(long update due to jira being down last night, so i just kept a stream of
concious buffer which i'm now posting)
With the last patch, i ran this bash script...
{code}
# .. The current classpath supports the following names: [Asserting,
CheapBastard, FastCompressingStoredFields,
FastDecompressionCompressingStoredFields,
HighCompressionCompressingStoredFields, DummyCompressingStoredFields,
SimpleText, Lucene50]
codecs=(Asserting CheapBastard FastCompressingStoredFields
FastDecompressionCompressingStoredFields HighCompressionCompressingStoredFields
DummyCompressingStoredFields SimpleText Lucene50 random)
for c in "${codecs[@]}"
do
echo $c
for i in {1..50}; do ant test -Dtestcase=TestTrieFacet -Dtests.verbose=true
-Dtests.codec=$c; done | tee $c.log.txt
done
{code}
the only codec with failures was "random"...
{noformat}
$ grep -c "reproduce with" *.log.txt
Asserting.log.txt:0
CheapBastard.log.txt:0
DummyCompressingStoredFields.log.txt:0
FastCompressingStoredFields.log.txt:0
FastDecompressionCompressingStoredFields.log.txt:0
HighCompressionCompressingStoredFields.log.txt:0
Lucene50.log.txt:0
random.log.txt:9
SimpleText.log.txt:0
{noformat}
Those 9 failures...
{noformat}
$ egrep "reproduce with|test params" *.log.txt | grep -A 1 "reproduce with"
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_enum
-Dtests.seed=FA4AA4357AB98B18 -Dtests.slow=true -Dtests.locale=ar_MA
-Dtests.timezone=America/Bogota -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fc
-Dtests.seed=FA4AA4357AB98B18 -Dtests.slow=true -Dtests.locale=ar_MA
-Dtests.timezone=America/Bogota -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fcs
-Dtests.seed=FA4AA4357AB98B18 -Dtests.slow=true -Dtests.locale=ar_MA
-Dtests.timezone=America/Bogota -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: test params are:
codec=Asserting(Lucene50): {foo_ti=PostingsFormat(name=MockRandom),
foo_i=Lucene50(blocksize=128), range_facet_l_dv=PostingsFormat(name=Asserting),
_version_=PostingsFormat(name=MockRandom),
multiDefault=Lucene50(blocksize=128),
intDefault=PostingsFormat(name=MockRandom), id=PostingsFormat(name=SimpleText),
range_facet_i_dv=PostingsFormat(name=MockRandom),
foo_ti1=PostingsFormat(name=Asserting), foo_i1=PostingsFormat(name=Asserting),
range_facet_l=PostingsFormat(name=MockRandom),
timestamp=PostingsFormat(name=MockRandom)},
docValues:{range_facet_l_dv=DocValuesFormat(name=Direct),
range_facet_i_dv=DocValuesFormat(name=Lucene50),
timestamp=DocValuesFormat(name=Lucene50)}, sim=DefaultSimilarity, locale=ar_MA,
timezone=America/Bogota
--
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fcs
-Dtests.seed=7CE0E739965D7ECD -Dtests.slow=true -Dtests.locale=mt_MT
-Dtests.timezone=Pacific/Bougainville -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_enum
-Dtests.seed=7CE0E739965D7ECD -Dtests.slow=true -Dtests.locale=mt_MT
-Dtests.timezone=Pacific/Bougainville -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fc
-Dtests.seed=7CE0E739965D7ECD -Dtests.slow=true -Dtests.locale=mt_MT
-Dtests.timezone=Pacific/Bougainville -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
random.log.txt: [junit4] 2> NOTE: test params are:
codec=Asserting(Lucene50): {foo_ti=BlockTreeOrds(blocksize=128),
foo_i=PostingsFormat(name=LuceneFixedGap), range_facet_l_dv=FSTOrd50,
_version_=BlockTreeOrds(blocksize=128),
multiDefault=PostingsFormat(name=LuceneFixedGap),
intDefault=BlockTreeOrds(blocksize=128), id=FSTOrd50,
range_facet_i_dv=BlockTreeOrds(blocksize=128),
foo_ti1=PostingsFormat(name=Memory doPackFST= false), foo_i1=FSTOrd50,
range_facet_l=BlockTreeOrds(blocksize=128),
timestamp=BlockTreeOrds(blocksize=128)},
docValues:{range_facet_l_dv=DocValuesFormat(name=Memory),
range_facet_i_dv=DocValuesFormat(name=Asserting),
timestamp=DocValuesFormat(name=Asserting)},
sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=mt_MT,
timezone=Pacific/Bougainville
--
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fcs
-Dtests.seed=2A1E7082CBAD1C7C -Dtests.slow=true -Dtests.locale=sr_BA
-Dtests.timezone=Indian/Kerguelen -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_fc
-Dtests.seed=2A1E7082CBAD1C7C -Dtests.slow=true -Dtests.locale=sr_BA
-Dtests.timezone=Indian/Kerguelen -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestTrieFacet -Dtests.method=testMultiValuedTrieP8_enum
-Dtests.seed=2A1E7082CBAD1C7C -Dtests.slow=true -Dtests.locale=sr_BA
-Dtests.timezone=Indian/Kerguelen -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
random.log.txt: [junit4] 2> NOTE: test params are:
codec=Asserting(Lucene50): {foo_ti=PostingsFormat(name=MockRandom),
foo_i=PostingsFormat(name=MockRandom),
range_facet_l_dv=PostingsFormat(name=Memory doPackFST= true),
_version_=PostingsFormat(name=MockRandom),
multiDefault=PostingsFormat(name=Asserting),
intDefault=PostingsFormat(name=MockRandom), id=PostingsFormat(name=Memory
doPackFST= true), range_facet_i_dv=PostingsFormat(name=Asserting),
foo_ti1=PostingsFormat(name=Memory doPackFST= true), foo_i1=FST50,
range_facet_l=PostingsFormat(name=Asserting),
timestamp=PostingsFormat(name=Asserting)},
docValues:{range_facet_l_dv=DocValuesFormat(name=Asserting),
range_facet_i_dv=DocValuesFormat(name=Memory),
timestamp=DocValuesFormat(name=Memory)},
sim=RandomSimilarityProvider(queryNorm=false,coord=no): {}, locale=sr_BA,
timezone=Indian/Kerguelen
{noformat}
...time to start playing with -Dtests.postingformat (using those 3 seeds)...
{code}
# ... The current classpath supports the following names: [MockRandom, RAMOnly,
LuceneFixedGap, LuceneVarGapFixedInterval, LuceneVarGapDocFreqInterval,
TestBloomFilteredLucenePostings, Asserting, BlockTreeOrds, BloomFilter, Direct,
FSTOrd50, FST50, Memory, SimpleText, AutoPrefix, completion, Lucene50]
postings=(MockRandom RAMOnly LuceneFixedGap LuceneVarGapFixedInterval
LuceneVarGapDocFreqInterval TestBloomFilteredLucenePostings Asserting
BlockTreeOrds BloomFilter Direct FSTOrd50 FST50 Memory SimpleText AutoPrefix
completion Lucene50)
seeds=(2A1E7082CBAD1C7C 7CE0E739965D7ECD FA4AA4357AB98B18)
for p in "${postings[@]}"
do
for s in "${seeds[@]}"
do
echo $p
echo $s
ant test -Dtestcase=TestTrieFacet
-Dtests.method=testMultiValuedTrieP8_fcs -Dtests.verbose=true
-Dtests.codec=random -Dtests.postingsformat=$p -Dtests.seed=$s | tee
$s.$p.log.txt
done
done
{code}
...and how many of the 3 seeds did each codec fail? ...
{noformat}
$ grep -l "reproduce with" *.*.log.txt | cut -d . -f 2 | sort | uniq -c | sort
-rn
3 RAMOnly
3 completion
3 BloomFilter
3 AutoPrefix
2 LuceneFixedGap
2 Direct
2 BlockTreeOrds
1 MockRandom
{noformat}
...and how many passed? ...
{noformat}
$ grep -L "reproduce with" *.*.log.txt | cut -d . -f 2 | sort | uniq -c | sort
-rn
3 TestBloomFilteredLucenePostings
3 SimpleText
3 Memory
3 LuceneVarGapFixedInterval
3 LuceneVarGapDocFreqInterval
3 Lucene50
3 FSTOrd50
3 FST50
3 Asserting
2 MockRandom
1 LuceneFixedGap
1 Direct
1 BlockTreeOrds
{noformat}
...i was hoping for a more clear pattern. looking closer at the data, a lot of
these failures are completely diff from this bug -- and make me thing there are
glitches in how the "-Dtests.postingsformat" option builds a Codec on the fly.
An examples of what i mean -- all the RAMOnly failures look like this...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestTrieFacet
-Dtests.method=testMultiValuedTrieP8_fcs -Dtests.seed=2A1E7082CBAD1C7C
-Dtests.slow=true -Dtests.postingsformat=RAMOnly -Dtests.locale=nl_BE
-Dtests.timezone=Europe/Sofia -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
[junit4] ERROR 0.11s | TestTrieFacet.testMultiValuedTrieP8_fcs <<<
[junit4] > Throwable #1: java.lang.RuntimeException: Exception during
query
[junit4] > at
__randomizedtesting.SeedInfo.seed([2A1E7082CBAD1C7C:3E120D851303C7F0]:0)
[junit4] > at
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:770)
[junit4] > at
org.apache.solr.search.TestTrieFacet.doTestNoZeros(TestTrieFacet.java:134)
[junit4] > at
org.apache.solr.search.TestTrieFacet.testMultiValuedTrieP8_fcs(TestTrieFacet.java:194)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] > Caused by: java.lang.NullPointerException
[junit4] > at
org.apache.lucene.codecs.ramonly.RAMOnlyPostingsFormat$RAMTermsEnum.docFreq(RAMOnlyPostingsFormat.java:464)
[junit4] > at
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.docFreq(FilterLeafReader.java:210)
[junit4] > at
org.apache.lucene.index.FilteredTermsEnum.docFreq(FilteredTermsEnum.java:141)
[junit4] > at
org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.collectTerms(MultiTermQueryConstantScoreWrapper.java:130)
[junit4] > at
org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:152)
[junit4] > at
org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.bulkScorer(MultiTermQueryConstantScoreWrapper.java:198)
[junit4] > at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
[junit4] > at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:367)
{noformat}
Let's look for the exact XPATH errors we are expecting with these seeds...
{noformat}
$ grep -l "REQUEST FAILED:
xpath=\*\[count(//lst\[\@name='facet_fields'\]/lst\[\@name='foo_ti'\]/int\[\@name='0'\])=0]"
*.*.log.txt | cut -d . -f 2 | sort | uniq -c | sort -rn
2 LuceneFixedGap
2 Direct
2 BlockTreeOrds
1 MockRandom
{noformat}
Still no obvious pattern -- these postings formats all resulted in the
'value="0"' failure at least once, but also passed at least once (see counts
above) ... which makes me question my hypothosis that this is related to the
postingsformat -- for MockRandom maybe, it does some additional randomization
after it's selected on construction, but the others all use their default
constructors when specified this way.
(of course, maybe the cause that i specified -Dtests.postingsformat instead of
relying on -Dtests.codec=random to pick one is causing the random consumption
to change such that that affects something _else_ ? ... my head hurts)
> Faceting on multivalued Trie fields with precisionStep != 0 can produce bogus
> value="0" in some situations
> ----------------------------------------------------------------------------------------------------------
>
> Key: SOLR-7631
> URL: https://issues.apache.org/jira/browse/SOLR-7631
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Attachments: SOLR-7631_test.patch, SOLR-7631_test.patch, log.tgz
>
>
> Working through SOLR-7605, I've confirmed that the underlying problem exists
> for regular {{field.facet}} situations, regardless of distrib mode, for Trie
> fields that have a non-zero precisionStep -- there's still ome other missing
> piece of the puzzle i haven't figured out, but it relates in some way to some
> of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)
> The problem, when it manifests, is that faceting on a TrieIntField, using
> {{facet.mincount=0}}, causes the facet results to include three instances of
> facet the value "0" listed with a count of "0" -- even though no document in
> the index contains this value at all...
> {noformat}
> [junit4] > <lst name="facet_fields">
> [junit4] > <lst name="foo_ti">
> [junit4] > <int name="20">32</int>
> ...
> [junit4] > <int name="50">21</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> [junit4] > <int name="0">0</int>
> {noformat}
> This is concerning for a few reasons:
> * In the case of PivotFaceting, getting duplicate values back from a single
> shard like this triggers an assert in distributed queries and the request
> fails -- even if asserts aren't enabled, the bogus "0" value can be
> propogated to clients if they ask for facet.pivot.mincount=0
> * Client code expecting a single (value,count) pair for each value may
> equally be confused/broken by this response where the same "value" is
> returned multiple times
> * w/o knowing the root cause, It seems very possible that other nonsense
> values may be getting returned -- ie: if the error only happens with fields
> utilizing precisionStep, then it's likely related to the synthetic values
> used for faster range queries, and other synthetic values may be getting
> included with bogus counts
> A Patch with a simple test that can demonstrate the bug fairly easily will be
> attached shortly
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]