[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518698#comment-16518698 ]
Steve Rowe commented on SOLR-12343: ----------------------------------- Not sure if it relates to this bug -- please move/add if not -- but my Jenkins found a reproducing failure for {{TestCloudJSONFacetSKG.testBespoke()}}: {noformat} Checking out Revision 008bc74bebef96414f19118a267dbf982aba58b9 (refs/remotes/origin/master) [...] ant test -Dtestcase=TestCloudJSONFacetSKG -Dtests.method=testBespoke -Dtests.seed=5D223D88BF5BF89 -Dtests.slow=true -Dtests.locale=bg-BG -Dtests.timezone=America/Asuncion -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 0.11s J0 | TestCloudJSONFacetSKG.testBespoke <<< [junit4] > Throwable #1: java.lang.AssertionError: Didn't check a single bucket??? [junit4] > at __randomizedtesting.SeedInfo.seed([5D223D88BF5BF89:E09A7E14375787E]:0) [junit4] > at org.apache.solr.cloud.TestCloudJSONFacetSKG.testBespoke(TestCloudJSONFacetSKG.java:219) [junit4] > at java.lang.Thread.run(Thread.java:748) [...] [junit4] 2> NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=4, maxDocsPerChunk=1, blockSize=332), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=4, blockSize=332)), sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@4052d535), locale=el, timezone=Indian/Antananarivo [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_151 (64-bit)/cpus=16,threads=1,free=213710424,total=526909440 {noformat} > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > -------------------------------------------------------------------------------- > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org