[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Hoss Man (JIRA) Mon, 08 Sep 2014 10:12:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125765#comment-14125765
 ]


Hoss Man commented on SOLR-6154:
--------------------------------

Erick, sorry for the late reply.

 I haven't looked in depth at your patch for this issue or SOLR-6187, but in 
response to your question on the mailing list...

bq. The problem here is that it assumes that the first list in has all the 
counts that ever will be reported from any shard.

You are almost certainly correct, it's very probably that the logic for 
distributed range faceting isn't taking into account the possibility of 
mincount suppressing buckets from one or more shards.

the general strategy for dealing with this in field faceting & pivot faceting 
(which i suspect is what you already doing in your patch) is to have the 
coordinator node modify the mincount params when it sends the shard requests to 
force mincount=0, to ensure it gets a response for every bucket from every 
shard, then filter the response based on the (original) combined mincount.

{panel:title="not recommended idea"}
I say "modify" because one of the strategies taken with field/pivot faceting 
when using "facet.sort=index" is this...

{noformat}
// we're sorting by index order.
// if minCount==0, we should always be able to get accurate results w/o
// over-requesting or refining
// if minCount==1, we should be able to get accurate results w/o
// over-requesting, but we'll need to refine
// if minCount==n (>1), we can set the initialMincount to
// minCount/nShards, rounded up.
// ...
{noformat}

there is no sorting or "top-n" aspect to facet.range, so the idea of 
"over-requesting" doesn't apply -- but the minCount/nShards idea still applies. 
 if the user requests a minCount of "10" and there are 3 shards, then you could 
set f.foo.facet.mincount=4 for the shard requests -- because unless at lest one 
shard responds back with a count higher then "4", you'll never be able to 
satisfy the original mincount=10 ... HOWEVER: using this strategy requires 
"refinement" requests, which we currently avoid in range faceting.
{panel}

i would not advise going with the refinement approach described above (hence 
the panel box labeling it not-recommended) because i think the single pass 
approach of range faceting right now is probably better for most common cases 
-- we just need to force mincount=0 on hte shard requests -- but i wanted to 
post it for completeness in case i'm missing something and you think it's a 
really good idea


> SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on 
> response
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-6154
>                 URL: https://issues.apache.org/jira/browse/SOLR-6154
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.5.1, 4.8.1
>         Environment: Solr 4.5.1 under Linux  - explicit id routing
>      Indexed 400,000+ Documents
>      explicit routing 
>      custom schema.xml
>  
> Solr 4.8.1 under Windows+Cygwin
>      Indexed 6 Documents
>      implicit id routing
>      out of the box schema
>            Reporter: Ronald Matamoros
>            Assignee: Erick Erickson
>         Attachments: HowToReplicate.pdf, data.xml
>
>
> Attached
> - PDF with instructions on how to replicate.
> - data.xml to replicate index
> The f.<field>.facet.mincount option on a distributed search gives 
> inconsistent list of buckets on a range facet.
>  
> Experiencing that some buckets are ignored when using the option 
> "f.<field>.facet.mincount=1".
> The Solr logs do not indicate any error or warning during execution.
> The debug=true option and increasing the log levels to the FacetComponent do 
> not provide any hints to the behaviour.
> Replicated the issue on both Solr 4.5.1 & 4.8.1.
> Example, 
> Removing the f.<field>.facet.mincount=1 option gives the expected list of 
> buckets for the 6 documents matched.
>         <lst name="facet_ranges">
>          <lst name="price">
>            <lst name="counts">
>              <int name="0.0">0</int>
>              <int name="50.0">1</int>
>              <int name="100.0">0</int>
>              <int name="150.0">3</int>
>              <int name="200.0">0</int>
>              <int name="250.0">1</int>
>              <int name="300.0">0</int>
>              <int name="350.0">0</int>
>              <int name="400.0">0</int>
>              <int name="450.0">0</int>
>              <int name="500.0">0</int>
>              <int name="550.0">0</int>
>              <int name="600.0">0</int>
>              <int name="650.0">0</int>
>              <int name="700.0">0</int>
>              <int name="750.0">1</int>
>              <int name="800.0">0</int>
>              <int name="850.0">0</int>
>              <int name="900.0">0</int>
>              <int name="950.0">0</int>
>            </lst>
>            <float name="gap">50.0</float>
>            <float name="start">0.0</float>
>            <float name="end">1000.0</float>
>            <int name="before">0</int>
>            <int name="after">0</int>
>            <int name="between">2</int>
>          </lst>
>        </lst>
> Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but 
> will also omit bucket <int name="250.0">
>        <lst name="facet_ranges">
>           <lst name="price">
>             <lst name="counts">
>                 <int name="50.0">1</int>
>                 <int name="150.0">3</int>
>                 <int name="750.0">1</int>
>              </lst>
>              <float name="gap">50.0</float>
>              <float name="start">0.0</float>
>              <float name="end">1000.0</float>
>              <int name="before">0</int>
>              <int name="after">0</int>
>              <int name="between">4</int>
>           </lst>
>         </lst>
> Resubmitting the query renders a different bucket list 
> (May need to resubmit a couple times)
>        <lst name="facet_ranges">
>           <lst name="price">
>             <lst name="counts">
>                 <int name="150.0">3</int>
>                 <int name="250.0">1</int>
>              </lst>
>              <float name="gap">50.0</float>
>              <float name="start">0.0</float>
>              <float name="end">1000.0</float>
>              <int name="before">0</int>
>              <int name="after">0</int>
>              <int name="between">2</int>
>           </lst>
>         </lst>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Reply via email to