[ 
https://issues.apache.org/jira/browse/SOLR-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085353#comment-14085353
 ] 

Hoss Man commented on SOLR-6319:
--------------------------------


Consider the following sample data...

{code:title=1.csv}
foo_t
a b c d e f g h
a
a
a
a
a
a
a
a
a
b
b
b
b
b
b
b
b
b
g
g
g
g
{code}

{code:title=2.csv}
foo_t
a b c d e f g h
b
f
f
f
f
f
f
f
f
f
g
g
g
g
g
g
g
g
g
g
g
h
h
h
h
h
h
h
h
h
h
h
h
{code}


If you index this data in a single node solr setup, the following queries 
produce the results you expect...

{noformat}
$ curl "http://localhost:8983/solr/update?rowidOffset=100&rowid=id&commit=true"; 
-H 'Content-type:application/csv; charset=utf-8' --data-binary @1.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">522</int></lst>
</response>
$ curl "http://localhost:8983/solr/update?rowidOffset=200&rowid=id&commit=true"; 
-H 'Content-type:application/csv; charset=utf-8' --data-binary @2.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">435</int></lst>
</response>
$ curl -sS 
'http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.sort=index&omitHeader=true&wt=json&indent=true'
{
  "response":{"numFound":57,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "foo_t":[
        "a",11,
        "b",12,
        "c",2,
        "d",2,
        "e",2,
        "f",11,
        "g",17,
        "h",14]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{}}}
$ curl -sS 
'http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.limit=1&facet.mincount=13&facet.sort=index&omitHeader=true&wt=json&indent=true'
{
  "response":{"numFound":57,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "foo_t":[
        "g",17]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{}}}
{noformat}


But in a simple 2 node distributed setup...

{noformat}
$ curl "http://localhost:8881/solr/update?rowidOffset=100&rowid=id&commit=true"; 
-H 'Content-type:application/csv; charset=utf-8' --data-binary @1.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">483</int></lst>
</response>
$ curl "http://localhost:8882/solr/update?rowidOffset=200&rowid=id&commit=true"; 
-H 'Content-type:application/csv; charset=utf-8' --data-binary @2.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">456</int></lst>
</response>
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.sort=index&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
  "response":{"numFound":57,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "foo_t":[
        "a",11,
        "b",12,
        "c",2,
        "d",2,
        "e",2,
        "f",11,
        "g",17,
        "h",14]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{}}}
$ curl -sS 
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.limit=1&facet.mincount=13&facet.sort=index&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
  "response":{"numFound":57,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "foo_t":[]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{}}}
{noformat}


Bottom Line: we should be overrequesting when facet.sort=index is combined with 
facet.mincount > 0



> if mincount > 1, facet.field needs to overrequest even if facet.sort=index
> --------------------------------------------------------------------------
>
>                 Key: SOLR-6319
>                 URL: https://issues.apache.org/jira/browse/SOLR-6319
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>
> Discovered this while working on SOLR-2894.  the logic for distributed 
> faceting ignores over requesting (beyond the user specified facet.limit) if 
> the facet.sort is index order -- but the rationale for doing this falls apart 
> if the user has specified a facet.mincount > 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to