[ 
https://issues.apache.org/jira/browse/SOLR-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087984#comment-14087984
 ] 

Hoss Man commented on SOLR-6329:
--------------------------------

Notes from SOLR-2894 about the root of the issue...

{panel}

>From what I can tell, the gist of the issue is that when dealing with 
>sub-fields of the pivot, the coordination code doesn't know about some of the 
>"0" values if no shard which has the value for the parent field even knows 
>about the existence of the term.

The simplest example of this discrepency (compared to single node pivots) is to 
consider an index with only 2 docs...

{noformat}
[{"id":1,"top_s":"foo","sub_s":"bar"}
 {"id":2,"top_s":"xxx","sub_s":"yyy"}]
{noformat}

If those two docs exist in a single node index, and you pivot on 
{{top_s,sub_s}} using mincount=0 you get a response like this...

{noformat}
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true'
{
  "response":{"numFound":2,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1},
            {
              "field":"sub_s",
              "value":"yyy",
              "count":0}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1},
            {
              "field":"sub_s",
              "value":"bar",
              "count":0}]}]}}}
{noformat}

If however you index each of those docs on a seperate shard, the response comes 
back like this...

{noformat}
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1}]}]}}}
{noformat}

The only solution i can think of, would be an extra (special to mincount=0) 
stage of logic, after each PivotFacetField is refined, that would:
* iterate over all the values of the current pivot
* build up a Set of all all the known values for the child-pivots of of those 
values
* iterate over all the values again, merging in a "0"-count child value for 
every value in the set

...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add 
a count of '0' for 'v_x' in every 'sub_field' collection nested under the 
'top_field' in our 'top_field,sub_field' pivot"

I haven't thought this idea through enough to be confident it would work, or 
that it's worth doing ... i'm certainly not convinced that mincount=0 makes 
enough sense in a facet.pivot usecase to think getting this test working should 
hold up getting this committed -- probably something that should just be 
committed as is, with an open Jira that it's a known bug.
{panel}

SOLR-2894 includes a commented out test case related to using mincount=0 in 
distributed pivot faceting in DistributedFacetPivotLargeTest (annotated with 
"SOLR-6329")

> facet.pivot.mincount=0 doesn't work well in distributed pivot faceting
> ----------------------------------------------------------------------
>
>                 Key: SOLR-6329
>                 URL: https://issues.apache.org/jira/browse/SOLR-6329
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Priority: Minor
>
> Using facet.pivot.mincount=0 in conjunction with the distributed pivot 
> faceting support being added in SOLR-2894 doesn't work as folks would expect 
> if they are use to using facet.pivot.mincount=0 in a single node setup.
> Filing this issue to track this as a known defect, because it may not have a 
> viable solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to