[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---------------------------

    Attachment: SOLR-2894.patch


Ater working through the fix the the refinement logic in 
PivotFacetField.queuePivotRefinementRequests the previously failing seed for 
TestCloudPivotFacet started to pass, but some sort=index tests still weren't 
working, which lead me to realize 2 things:
* some of my tests were absurd -- i've gotten use to using overrequest=0 as a 
way to force refinement, but with facet.sort=index combined with limit (and 
offset) ad mincount it ment that it was impossible for the sort=index facet 
logic to ever find the results we're looking for.  We *have* to allow some 
overrequest when mincount>1 or the initial shard requests won't find the values 
(that will ultimately have a cumulative mincount high enough) in order to even 
try refining them.
* offset wasn't being added to the limit in the per-shard requests, so w/o 
overrequest enabled you would never get teh values you needed even in ideal 
situations
* the shard query logic in FacetComponent was ignoring overrequest when 
sort=index ... this seems broken to me, but from what i can tell, it comes 
straight form the existing facet.field logic as well.

I'll open a bug to track the existing broken logic overrequest logic in 
facet.field -- even though i hope that once we're done with this issue, it may 
be fixed via refactoring and shared code with pivots (i'm not 100% certain: the 
FacetComponent diff is the bulk of what i still need to review more closely on 
this issue)

There's still a failure in DistributedFacetPivotLargeTest (mismatch comapred to 
control) when i tried using mincount=0 that i'm not certain if/how we can 
solve...

{code}
// :nocommit: broken honda?
rsp = query( params( "q", "*:*",
                     "rows", "0",
                     "facet","true",
                     "facet.sort","index",
                     "f.place_s.facet.limit", "20",
                     "f.place_s.facet.offset", "40",
                     FacetParams.FACET_PIVOT_MINCOUNT,"0",
                     "facet.pivot", "place_s,company_t") );
{code}

>From what I can tell, the gist of the issue is that when dealing with 
>sub-fields of the pivot, the coordination code doesn't know about some of the 
>"0" values if no shard which has the value for the parent field even knows 
>about the existence of the term.

The simplest example of this discrepency (compared to single node pivots) is to 
consider an index with only 2 docs...

{noformat}
[{"id":1,"top_s":"foo","sub_s":"bar"}
 {"id":2,"top_s":"xxx","sub_s":"yyy"}]
{noformat}

If those two docs exist in a single node index, and you pivot on 
{{top_s,sub_s}} using mincount=0 you get a response like this...

{noformat}
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true'
{
  "response":{"numFound":2,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1},
            {
              "field":"sub_s",
              "value":"yyy",
              "count":0}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1},
            {
              "field":"sub_s",
              "value":"bar",
              "count":0}]}]}}}
{noformat}

If however you index each of those docs on a seperate shard, the response comes 
back like this...

{noformat}
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1}]}]}}}
{noformat}

The only solution i can think of, would be an extra (special to mincount=0) 
stage of logic, after each PivotFacetField is refined, that would:
* iterate over all the values of the current pivot
* build up a Set of all all the known values for the child-pivots of of those 
values
* iterate over all the values again, merging in a "0"-count child value for 
every value in the set

...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add 
a count of '0' for 'v_x' in every 'sub_field' collection nested under the 
'top_field' in our 'top_field,sub_field' pivot"

I haven't thought this idea through enough to be confident it would work, or 
that it's worth doing ... i'm certainly not convinced that mincount=0 makes 
enough sense in a facet.pivot usecase to think getting this test working should 
hold up getting this committed -- probably something that should just be 
committed as is, with an open Jira that it's a known bug.






{panel:title=Summary Changes in this patch}

* PivotFacet
** add a new REFINE_PARAM constant for "fpt"

* PivotFacetProcessor
** javadocs
** use REFINE_PARAM constant

* PivotFacetField
** processDefiniteCandidateElement
*** javadocs
*** numberOfValuesContributedByShardWasLimitedByFacetFieldLimit can only be 
trusted when sort=count
** processPossibleCandidateElement
*** method only useful when sort=count
*** added assert & javadocs making this clear
** queuePivotRefinementRequests
*** call processDefiniteCandidateElement on all elements when using sort=index

* FacetComponent
** applyToShardRequests - removed this method
*** a bunch of it was dead code (if limit > 0, no need to check limit>=0)
*** most of what wasn't dead code was also being done by the callers (ie: 
redundent overrequest logic)
*** this was also where the original mincount=0 bug lived (mincount was being 
forced to 1 when called from pivot cade)
** modifyRequestForIndividualPivotFacets & modifyRequestForFieldFacets
*** made sure they were directly doing the stuff they use to depend on 
applyToShardRequests for
*** fixed up limit+offset & overrequest logic
** use REFINE_PARAM constant


* DistributedFacetPivotLargeTest
** fixed tests to be less overzealous about overrequest=0
** added more mincount=0 testing (currently fails)


{panel}


> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>            Assignee: Hoss Man
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, 
> dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to