[ 
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007508#comment-13007508
 ] 

Yonik Seeley commented on SOLR-2403:
------------------------------------

bq. Dividing by shard count is fairly risky. 

Actually, it seems like it should help? (when mincount is relatively high at 
least).

Let's take your example of facet.mincount=10, facet.limit=2, facet.sort=index
{code}
Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1)
Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10)
{code}

mincount / nShards = 5, so the shard requests sent will be along the lines of
facet.mincount=5, facet.limit=5, facet.sort=index  (some over-requesting)
and we will get back
F(9), H(10)

The second phase (facet refinement... to true up counts) will retrieve counts 
from each shard for constraints in the list that it didn't return the first 
time.
So shard1 will be asked about H, and shard2 will be asked about F.

The final response will be F(10),H(11)

bq. Over-requesting helps, but only linear to the fraction of the full 
result-set from each shard that is requested.

Yes, I think you're correct that over-requesting is less useful for sort=index 
than sort=count.
Luckily, we can fix the mincount=1 problem and get exact answers for that case, 
which is the most important case.  I think mincount > 1 is relatively rare.




> Problem with facet.sort=lex, shards, and facet.mincount
> -------------------------------------------------------
>
>                 Key: SOLR-2403
>                 URL: https://issues.apache.org/jira/browse/SOLR-2403
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.0
>         Environment: RHEL5, Ubuntu 10.04
>            Reporter: Peter Cline
>
> I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 
> 1.4.1.  I can if necessary and update.
> Solr is not returning the proper number of facet values when sorting 
> alphabetically, using distributed search, and using a facet.mincount that 
> excludes some of the values in the first facet.limit values.
> Easiest explained by example.  Sorting alphabetically, the first 20 values 
> for my "subject_facet" field have few documents.  19 facet values have only 1 
> document associated, and 1 has 2 documents.  There are plenty after that have 
> more than 2.
> {code}
> http://localhost:8082/solr/select?q=*:*&facet=true&facet.field=subject_facet&facet.limit=20&facet.sort=lex&facet.mincount=2
> {code}
> comes back with the expected 20 facet values with >= 2 documents associated.
> If I add a shards parameter that points back to itself, the result is 
> different.
> {code}
> http://localhost:8082/solr/select?q=*:*&facet=true&facet.field=subject_facet&facet.limit=20&facet.sort=lex&facet.mincount=2&shards=localhost:8082/solr
> {code}
> comes back with only 1 facet value: the single value in the first 20 that had 
> more than 1 document.  
> It appears to me that mincount is ignored when doing the original query to 
> the shards, then applied afterwards.
> Let me know if you need any more info.  
> Thanks,
> Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to