[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

Alexander S. (JIRA) Wed, 10 Sep 2014 03:38:19 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128322#comment-14128322
 ]


Alexander S. commented on SOLR-6494:
------------------------------------

Unfortunately that doesn't solve the problem completely, these queries take ≈7 
seconds instead of 15:
{code}
{!cache=false}type:"Award::Nomination"
{!cache=false cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z]
{code}
Which is still not good since I have only 11 974 docs with 
type:"Award::Nomination" and 139 716 883 with created_at_d:[* TO 
2014-09-08T23:59:59Z]. if the cost parameter tells Solr to apply cheapest 
filters first why the query still takes so long? It seems even though it 
doesn't run them in parallel filters still don't know of each other and go 
through all docs. My point is that it would be much faster if it could run 
filters one by one and if each next filter would work not with the entire data 
set but with results returned from the previous filter.

Also tried cost >= 100 to apply a filter as a post filter, but nothing changes, 
same 7 seconds. Filter cache doesn't help here.

So this:
>> By design, fq clauses like this are calculated for the entire document set 
>> and the results cached, there is no "ordering" for that part.
doesn't sound right to me. Sometimes we don't need to reuse filters (and 
sometimes even can't, e.g. the cost option requires cache=false).

In the provided use case the way Solr applies filters is more harmful than 
useful. I'd even say more than 600 times harmful. The query that wouldn't take 
more than a second in MySQL takes 15 seconds in a search engine that uses rapid 
SSD RAID 10, has a few shards and replicas, uses more that 160G of memory in 
total and has ≈40 CPU cores.

Thus this sounds like a feature leak (at least). Please share your thoughts on 
this.

> Query filters applied in a wrong order
> --------------------------------------
>
>                 Key: SOLR-6494
>                 URL: https://issues.apache.org/jira/browse/SOLR-6494
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.8.1
>            Reporter: Alexander S.
>
> This query:
> {code}
> {
>   fq: ["type:Award::Nomination"],
>   sort: "score desc",
>   start: 0,
>   rows: 20,
>   q: "*:*"
> }
> {code}
> takes just a few milliseconds, but this one:
> {code}
> {
>   fq: [
>     "type:Award::Nomination",
>     "created_at_d:[* TO 2014-09-08T23:59:59Z]"
>   ],
>   sort: "score desc",
>   start: 0,
>   rows: 20,
>   q: "*:*"
> }
> {code}
> takes almost 15 seconds.
> I have just ≈12k of documents with type "Award::Nomination", but around half 
> a billion with created_at_d field set. And it seems Solr applies the 
> created_at_d filter first going through all documents where this field is 
> set, which is not very smart.
> I think if it can't do anything better than applying filters in the alphabet 
> order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

Reply via email to