[
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125765#comment-14125765
]
Hoss Man commented on SOLR-6154:
--------------------------------
Erick, sorry for the late reply.
I haven't looked in depth at your patch for this issue or SOLR-6187, but in
response to your question on the mailing list...
bq. The problem here is that it assumes that the first list in has all the
counts that ever will be reported from any shard.
You are almost certainly correct, it's very probably that the logic for
distributed range faceting isn't taking into account the possibility of
mincount suppressing buckets from one or more shards.
the general strategy for dealing with this in field faceting & pivot faceting
(which i suspect is what you already doing in your patch) is to have the
coordinator node modify the mincount params when it sends the shard requests to
force mincount=0, to ensure it gets a response for every bucket from every
shard, then filter the response based on the (original) combined mincount.
{panel:title="not recommended idea"}
I say "modify" because one of the strategies taken with field/pivot faceting
when using "facet.sort=index" is this...
{noformat}
// we're sorting by index order.
// if minCount==0, we should always be able to get accurate results w/o
// over-requesting or refining
// if minCount==1, we should be able to get accurate results w/o
// over-requesting, but we'll need to refine
// if minCount==n (>1), we can set the initialMincount to
// minCount/nShards, rounded up.
// ...
{noformat}
there is no sorting or "top-n" aspect to facet.range, so the idea of
"over-requesting" doesn't apply -- but the minCount/nShards idea still applies.
if the user requests a minCount of "10" and there are 3 shards, then you could
set f.foo.facet.mincount=4 for the shard requests -- because unless at lest one
shard responds back with a count higher then "4", you'll never be able to
satisfy the original mincount=10 ... HOWEVER: using this strategy requires
"refinement" requests, which we currently avoid in range faceting.
{panel}
i would not advise going with the refinement approach described above (hence
the panel box labeling it not-recommended) because i think the single pass
approach of range faceting right now is probably better for most common cases
-- we just need to force mincount=0 on hte shard requests -- but i wanted to
post it for completeness in case i'm missing something and you think it's a
really good idea
> SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on
> response
> ----------------------------------------------------------------------------------
>
> Key: SOLR-6154
> URL: https://issues.apache.org/jira/browse/SOLR-6154
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.5.1, 4.8.1
> Environment: Solr 4.5.1 under Linux - explicit id routing
> Indexed 400,000+ Documents
> explicit routing
> custom schema.xml
>
> Solr 4.8.1 under Windows+Cygwin
> Indexed 6 Documents
> implicit id routing
> out of the box schema
> Reporter: Ronald Matamoros
> Assignee: Erick Erickson
> Attachments: HowToReplicate.pdf, data.xml
>
>
> Attached
> - PDF with instructions on how to replicate.
> - data.xml to replicate index
> The f.<field>.facet.mincount option on a distributed search gives
> inconsistent list of buckets on a range facet.
>
> Experiencing that some buckets are ignored when using the option
> "f.<field>.facet.mincount=1".
> The Solr logs do not indicate any error or warning during execution.
> The debug=true option and increasing the log levels to the FacetComponent do
> not provide any hints to the behaviour.
> Replicated the issue on both Solr 4.5.1 & 4.8.1.
> Example,
> Removing the f.<field>.facet.mincount=1 option gives the expected list of
> buckets for the 6 documents matched.
> <lst name="facet_ranges">
> <lst name="price">
> <lst name="counts">
> <int name="0.0">0</int>
> <int name="50.0">1</int>
> <int name="100.0">0</int>
> <int name="150.0">3</int>
> <int name="200.0">0</int>
> <int name="250.0">1</int>
> <int name="300.0">0</int>
> <int name="350.0">0</int>
> <int name="400.0">0</int>
> <int name="450.0">0</int>
> <int name="500.0">0</int>
> <int name="550.0">0</int>
> <int name="600.0">0</int>
> <int name="650.0">0</int>
> <int name="700.0">0</int>
> <int name="750.0">1</int>
> <int name="800.0">0</int>
> <int name="850.0">0</int>
> <int name="900.0">0</int>
> <int name="950.0">0</int>
> </lst>
> <float name="gap">50.0</float>
> <float name="start">0.0</float>
> <float name="end">1000.0</float>
> <int name="before">0</int>
> <int name="after">0</int>
> <int name="between">2</int>
> </lst>
> </lst>
> Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but
> will also omit bucket <int name="250.0">
> <lst name="facet_ranges">
> <lst name="price">
> <lst name="counts">
> <int name="50.0">1</int>
> <int name="150.0">3</int>
> <int name="750.0">1</int>
> </lst>
> <float name="gap">50.0</float>
> <float name="start">0.0</float>
> <float name="end">1000.0</float>
> <int name="before">0</int>
> <int name="after">0</int>
> <int name="between">4</int>
> </lst>
> </lst>
> Resubmitting the query renders a different bucket list
> (May need to resubmit a couple times)
> <lst name="facet_ranges">
> <lst name="price">
> <lst name="counts">
> <int name="150.0">3</int>
> <int name="250.0">1</int>
> </lst>
> <float name="gap">50.0</float>
> <float name="start">0.0</float>
> <float name="end">1000.0</float>
> <int name="before">0</int>
> <int name="after">0</int>
> <int name="between">2</int>
> </lst>
> </lst>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]