[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-2894: --------------------------- Attachment: SOLR-2894.patch Ater working through the fix the the refinement logic in PivotFacetField.queuePivotRefinementRequests the previously failing seed for TestCloudPivotFacet started to pass, but some sort=index tests still weren't working, which lead me to realize 2 things: * some of my tests were absurd -- i've gotten use to using overrequest=0 as a way to force refinement, but with facet.sort=index combined with limit (and offset) ad mincount it ment that it was impossible for the sort=index facet logic to ever find the results we're looking for. We *have* to allow some overrequest when mincount>1 or the initial shard requests won't find the values (that will ultimately have a cumulative mincount high enough) in order to even try refining them. * offset wasn't being added to the limit in the per-shard requests, so w/o overrequest enabled you would never get teh values you needed even in ideal situations * the shard query logic in FacetComponent was ignoring overrequest when sort=index ... this seems broken to me, but from what i can tell, it comes straight form the existing facet.field logic as well. I'll open a bug to track the existing broken logic overrequest logic in facet.field -- even though i hope that once we're done with this issue, it may be fixed via refactoring and shared code with pivots (i'm not 100% certain: the FacetComponent diff is the bulk of what i still need to review more closely on this issue) There's still a failure in DistributedFacetPivotLargeTest (mismatch comapred to control) when i tried using mincount=0 that i'm not certain if/how we can solve... {code} // :nocommit: broken honda? rsp = query( params( "q", "*:*", "rows", "0", "facet","true", "facet.sort","index", "f.place_s.facet.limit", "20", "f.place_s.facet.offset", "40", FacetParams.FACET_PIVOT_MINCOUNT,"0", "facet.pivot", "place_s,company_t") ); {code} >From what I can tell, the gist of the issue is that when dealing with >sub-fields of the pivot, the coordination code doesn't know about some of the >"0" values if no shard which has the value for the parent field even knows >about the existence of the term. The simplest example of this discrepency (compared to single node pivots) is to consider an index with only 2 docs... {noformat} [{"id":1,"top_s":"foo","sub_s":"bar"} {"id":2,"top_s":"xxx","sub_s":"yyy"}] {noformat} If those two docs exist in a single node index, and you pivot on {{top_s,sub_s}} using mincount=0 you get a response like this... {noformat} $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true' { "response":{"numFound":2,"start":0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "top_s,sub_s":[{ "field":"top_s", "value":"foo", "count":1, "pivot":[{ "field":"sub_s", "value":"bar", "count":1}, { "field":"sub_s", "value":"yyy", "count":0}]}, { "field":"top_s", "value":"xxx", "count":1, "pivot":[{ "field":"sub_s", "value":"yyy", "count":1}, { "field":"sub_s", "value":"bar", "count":0}]}]}}} {noformat} If however you index each of those docs on a seperate shard, the response comes back like this... {noformat} $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr' { "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "top_s,sub_s":[{ "field":"top_s", "value":"foo", "count":1, "pivot":[{ "field":"sub_s", "value":"bar", "count":1}]}, { "field":"top_s", "value":"xxx", "count":1, "pivot":[{ "field":"sub_s", "value":"yyy", "count":1}]}]}}} {noformat} The only solution i can think of, would be an extra (special to mincount=0) stage of logic, after each PivotFacetField is refined, that would: * iterate over all the values of the current pivot * build up a Set of all all the known values for the child-pivots of of those values * iterate over all the values again, merging in a "0"-count child value for every value in the set ...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add a count of '0' for 'v_x' in every 'sub_field' collection nested under the 'top_field' in our 'top_field,sub_field' pivot" I haven't thought this idea through enough to be confident it would work, or that it's worth doing ... i'm certainly not convinced that mincount=0 makes enough sense in a facet.pivot usecase to think getting this test working should hold up getting this committed -- probably something that should just be committed as is, with an open Jira that it's a known bug. {panel:title=Summary Changes in this patch} * PivotFacet ** add a new REFINE_PARAM constant for "fpt" * PivotFacetProcessor ** javadocs ** use REFINE_PARAM constant * PivotFacetField ** processDefiniteCandidateElement *** javadocs *** numberOfValuesContributedByShardWasLimitedByFacetFieldLimit can only be trusted when sort=count ** processPossibleCandidateElement *** method only useful when sort=count *** added assert & javadocs making this clear ** queuePivotRefinementRequests *** call processDefiniteCandidateElement on all elements when using sort=index * FacetComponent ** applyToShardRequests - removed this method *** a bunch of it was dead code (if limit > 0, no need to check limit>=0) *** most of what wasn't dead code was also being done by the callers (ie: redundent overrequest logic) *** this was also where the original mincount=0 bug lived (mincount was being forced to 1 when called from pivot cade) ** modifyRequestForIndividualPivotFacets & modifyRequestForFieldFacets *** made sure they were directly doing the stuff they use to depend on applyToShardRequests for *** fixed up limit+offset & overrequest logic ** use REFINE_PARAM constant * DistributedFacetPivotLargeTest ** fixed tests to be less overzealous about overrequest=0 ** added more mincount=0 testing (currently fails) {panel} > Implement distributed pivot faceting > ------------------------------------ > > Key: SOLR-2894 > URL: https://issues.apache.org/jira/browse/SOLR-2894 > Project: Solr > Issue Type: Improvement > Reporter: Erik Hatcher > Assignee: Hoss Man > Fix For: 4.9, 5.0 > > Attachments: SOLR-2894-mincount-minification.patch, > SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, > SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, > dateToObject.patch, pivot_mincount_problem.sh > > > Following up on SOLR-792, pivot faceting currently only supports > undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org