[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

Hoss Man (JIRA) Mon, 28 Jul 2014 18:56:15 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-2894:
---------------------------

    Attachment: SOLR-2894.patch


I've been focusing on more tests using facet.offset...

bq. I haven't looed into this closely, but i noticed the refinement code seems 
to only refine things started at the "facetFieldOffset," of the current 
collection don't we need to refine all the values, starting from the beginging 
of the list?

There was in fact a bug with refinement when using facet.offset -- but i was 
looking in the wrong place.  the code i was refering to before was involved in 
deciding which values to drilldown into when recursively refining the 
sub-pivots.  that logic was already (mostly) correct because by that point 
we've already refined the _current_ levle completly, so we can skip past the 
offset when doing the recursion (the only glitch was a boundary check causing 
an IOOBE, see detials below).  Earlier on in the code however, there was a 
mistake where only the limit (not the limit+offset) was being used to decide 
the threshold value for refinement.

----

New improvements in this patch...

* TestCloudPivotFacet
** increase the odds of overrequest==0
** randonly include a facet.offset param to sanity check refinement in that case

* PivotFacetField
** fix refineNextLevelOfFacets not to ask for a sublist with a start offset 
bigger then the size of the collection
*** this was causing an IndexOutOfBoundsException pretty quickly when offset 
was mixed into the random test
** fix queuePivotRefinementRequests to respect offset when picking the 
"indexOfCountThreshold"
*** before it was only looking at limit, with offset in the randomized test 
this was causing failures even when pivots only had one field in them!

----

A few more things to consider in the future...

* PivotFacetFieldValueCollection.refinableSubList is only use to deal with 
offset+limit sublisting from PivotFacetField.refineNextLevelOfFacets -- but 
PivotFacetFieldValueCollection already knows the offset&limit so maybe it 
should be a smarter special purpose method with 0 args: 
{{getNextLevelValuesToRefine()}}

* trim earlier?
** the way refinement currently works in PivotFacetField, after we've refined 
our values, we mark that we no longer need refinement, and then on the next 
call we recursively refine the subpivots of each value -- and in both cases we 
do the offset+limit calculations and hang on to all of the values (both below 
offset and above limit) as we keep iterating down hte pivots -- they don't get 
thrown away until the final trim() call just before building up the final 
result.
** i previously suggested folding the trim() logic into the NamedList response 
logic -- but now i'm wondering if the trim() logic should instead be folded 
into refinement?  so once we're sure a level is fully refined, we go ahead and 
trim that level before drilling down and refining it's kids?

----

Unfortunately, with this new patch, i did uncover a new random failure i can't 
easily explain (doesn't seem related ot the offset changes since facet.offset 
isn't evne used in these random params -- but it's possible i broke something 
while fixing that) ...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet 
-Dtests.method=testDistribSearch -Dtests.seed=775F7BCA685BBC22 
-Dtests.nightly=true -Dtests.slow=true -Dtests.locale=da_DK 
-Dtests.timezone=America/Montserrat -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 65.9s | TestCloudPivotFacet.testDistribSearch <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 
{main(facet=true&facet.pivot=pivot_tl%2Cpivot_tl%2Cpivot_y_s&facet.pivot=bogus_not_in_any_doc_s%2Cpivot_l1%2Cpivot_td&facet.limit=13&facet.missing=true&facet.sort=count&facet.overrequest.count=2),extra(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count)}
 ==> bogus_not_in_any_doc_s,pivot_l1,pivot_td: 
{params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})}
 expected:<7> but was:<9>
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([775F7BCA685BBC22:F6B9F5D21F04DC1E]:0)
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239)
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187)
   [junit4]    >        at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
   [junit4]    >        at java.lang.Thread.run(Thread.java:744)
   [junit4]    > Caused by: java.lang.AssertionError: 
bogus_not_in_any_doc_s,pivot_l1,pivot_td: 
{params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})}
 expected:<7> but was:<9>
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507)
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257)
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:268)
   [junit4]    >        at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229)
{noformat}

...i need to dig into this a bit more tommorow.


> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>            Assignee: Hoss Man
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894_cloud_test.patch, dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

Reply via email to