[jira] [Comment Edited] (SOLR-7580) Number of ScoreDoc instances equals rows parameter, not actual number of matches

Yonik Seeley (JIRA) Mon, 25 Apr 2016 08:07:09 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256450#comment-15256450
 ]


Yonik Seeley edited comment on SOLR-7580 at 4/25/16 3:06 PM:
-------------------------------------------------------------

bq.  we do setRows(Integer.MAX_VALUE);. According to the VisualVM samples, this 
results in a huge amount of ScoreDoc instances, making the query unreasonably 
slow.

I think this is due to the lucene search code prepopulating the priority queue.
See org.apache.lucene.util.PriorityQueue

bq. Isn't this an anti-pattern? Can't you use cursorMark?

I think it's something we should try to support efficiently.
One way is to not use the lucene sorting code when the number of hits is 
expected to be large... it's not optimized for that.


was (Author: [email protected]):
bq.  we do setRows(Integer.MAX_VALUE);. According to the VisualVM samples, this 
results in a huge amount of ScoreDoc instances, making the query unreasonably 
slow.

I think this is due to the lucene search code prepopulating the priority queue.

bq. Isn't this an anti-pattern? Can't you use cursorMark?

I think it's something we should try to support efficiently.
One way is to not use the lucene sorting code when the number of hits is 
expected to be large... it's not optimized for that.

> Number of ScoreDoc instances equals rows parameter, not actual number of 
> matches
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-7580
>                 URL: https://issues.apache.org/jira/browse/SOLR-7580
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 5.1
>            Reporter: Markus Jelsma
>             Fix For: 5.5, master
>
>
> We have several batch jobs that use StreamingResponseCallback to collect all 
> records matching a specific query. For each record, we execute a new query 
> and need all results without paging through them. Because we do not know the 
> amount of matches to expect, we do setRows(Integer.MAX_VALUE);. According to 
> the VisualVM samples, this results in a huge amount of ScoreDoc instances, 
> making the query unreasonably slow.
> The current work-around we use is to execute the same query with setRows(0), 
> get numResults, and then reissue the query with setRows(numResults). This is 
> fast, almost as fast as one would expect.
> This is, however, a very dirty work-around. I am unsure whether this is a 
> Solr or Lucene issue, SolrIndexSearcher is a beast to debug ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-7580) Number of ScoreDoc instances equals rows parameter, not actual number of matches

Reply via email to