[
https://issues.apache.org/jira/browse/SOLR-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085353#comment-14085353
]
Hoss Man commented on SOLR-6319:
--------------------------------
Consider the following sample data...
{code:title=1.csv}
foo_t
a b c d e f g h
a
a
a
a
a
a
a
a
a
b
b
b
b
b
b
b
b
b
g
g
g
g
{code}
{code:title=2.csv}
foo_t
a b c d e f g h
b
f
f
f
f
f
f
f
f
f
g
g
g
g
g
g
g
g
g
g
g
h
h
h
h
h
h
h
h
h
h
h
h
{code}
If you index this data in a single node solr setup, the following queries
produce the results you expect...
{noformat}
$ curl "http://localhost:8983/solr/update?rowidOffset=100&rowid=id&commit=true"
-H 'Content-type:application/csv; charset=utf-8' --data-binary @1.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">522</int></lst>
</response>
$ curl "http://localhost:8983/solr/update?rowidOffset=200&rowid=id&commit=true"
-H 'Content-type:application/csv; charset=utf-8' --data-binary @2.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">435</int></lst>
</response>
$ curl -sS
'http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.sort=index&omitHeader=true&wt=json&indent=true'
{
"response":{"numFound":57,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"foo_t":[
"a",11,
"b",12,
"c",2,
"d",2,
"e",2,
"f",11,
"g",17,
"h",14]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{}}}
$ curl -sS
'http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.limit=1&facet.mincount=13&facet.sort=index&omitHeader=true&wt=json&indent=true'
{
"response":{"numFound":57,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"foo_t":[
"g",17]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{}}}
{noformat}
But in a simple 2 node distributed setup...
{noformat}
$ curl "http://localhost:8881/solr/update?rowidOffset=100&rowid=id&commit=true"
-H 'Content-type:application/csv; charset=utf-8' --data-binary @1.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">483</int></lst>
</response>
$ curl "http://localhost:8882/solr/update?rowidOffset=200&rowid=id&commit=true"
-H 'Content-type:application/csv; charset=utf-8' --data-binary @2.csv
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">456</int></lst>
</response>
$ curl -sS
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.sort=index&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
"response":{"numFound":57,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"foo_t":[
"a",11,
"b",12,
"c",2,
"d",2,
"e",2,
"f",11,
"g",17,
"h",14]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{}}}
$ curl -sS
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.field=foo_t&facet.limit=1&facet.mincount=13&facet.sort=index&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
"response":{"numFound":57,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"foo_t":[]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{}}}
{noformat}
Bottom Line: we should be overrequesting when facet.sort=index is combined with
facet.mincount > 0
> if mincount > 1, facet.field needs to overrequest even if facet.sort=index
> --------------------------------------------------------------------------
>
> Key: SOLR-6319
> URL: https://issues.apache.org/jira/browse/SOLR-6319
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Assignee: Hoss Man
>
> Discovered this while working on SOLR-2894. the logic for distributed
> faceting ignores over requesting (beyond the user specified facet.limit) if
> the facet.sort is index order -- but the rationale for doing this falls apart
> if the user has specified a facet.mincount > 1
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]