[jira] [Commented] (SOLR-6810) Faster searching limited but high rows across many shards all with many hits

Per Steffensen (JIRA) Mon, 01 Aug 2016 07:37:35 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402138#comment-15402138
 ]


Per Steffensen commented on SOLR-6810:
--------------------------------------

I have not looked much into SOLR-8220, but from the little reading I have done, 
I guess you would have to doc-value the id-field for SOLR-8220 to help on the 
SOLR-6810 issue.

I guess, most systems do not have id-field doc-valued. I also guess, for most 
systems it is feasible to do reindex it all, so that id's become doc-value. But 
not for all systems - e.g. not for our systems. We have 1000 billion documents 
in one of our systems, and you do not just reindex all of that. It takes weeks 
and is a very complex operation (we have tried it a few times). I do not know 
if such an argument counts here, but anyway...

> Faster searching limited but high rows across many shards all with many hits
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-6810
>                 URL: https://issues.apache.org/jira/browse/SOLR-6810
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Per Steffensen
>            Assignee: Shalin Shekhar Mangar
>              Labels: distributed_search, performance
>         Attachments: SOLR-6810-hack-eoe.patch, SOLR-6810-trunk.patch, 
> SOLR-6810-trunk.patch, SOLR-6810-trunk.patch, branch_5x_rev1642874.patch, 
> branch_5x_rev1642874.patch, branch_5x_rev1645549.patch
>
>
> Searching "limited but high rows across many shards all with many hits" is 
> slow
> E.g.
> * Query from outside client: q=something&rows=1000
> * Resulting in sub-requests to each shard something a-la this
> ** 1) q=something&rows=1000&fl=id,score
> ** 2) Request the full documents with ids in the global-top-1000 found among 
> the top-1000 from each shard
> What does the subject mean
> * "limited but high rows" means 1000 in the example above
> * "many shards" means 200-1000 in our case
> * "all with many hits" means that each of the shards have a significant 
> number of hits on the query
> The problem grows on all three factors above
> Doing such a query on our system takes between 5 min to 1 hour - depending on 
> a lot of things. It ought to be much faster, so lets make it.
> Profiling show that the problem is that it takes lots of time to access the 
> store to get id’s for (up to) 1000 docs (value of rows parameter) per shard. 
> Having 1000 shards its up to 1 mio ids that has to be fetched. There is 
> really no good reason to ever read information from store for more than the 
> overall top-1000 documents, that has to be returned to the client.
> For further detail see mail-thread "Slow searching limited but high rows 
> across many shards all with high hits" started 13/11-2014 on 
> [email protected]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6810) Faster searching limited but high rows across many shards all with many hits

Reply via email to