[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

Brett Lucey (JIRA) Thu, 24 Apr 2014 09:23:24 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979898#comment-13979898
 ]


Brett Lucey edited comment on SOLR-2894 at 4/24/14 4:20 PM:
------------------------------------------------------------

Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount to 0 if the facet limit is set to something other than 
-1.  If the facet limit is -1, we lower the mincount to 1.  In your case, this 
would the limit would be 10 for the top level pivot, so we know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  


was (Author: brett.lucey):
Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount if the facet limit is set to something other than -1.  
In your case, this would be 10 for the top level pivot.  We know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  

> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

Reply via email to