[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128322#comment-14128322 ]
Alexander S. commented on SOLR-6494: ------------------------------------ Unfortunately that doesn't solve the problem completely, these queries take ≈7 seconds instead of 15: {code} {!cache=false}type:"Award::Nomination" {!cache=false cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z] {code} Which is still not good since I have only 11 974 docs with type:"Award::Nomination" and 139 716 883 with created_at_d:[* TO 2014-09-08T23:59:59Z]. if the cost parameter tells Solr to apply cheapest filters first why the query still takes so long? It seems even though it doesn't run them in parallel filters still don't know of each other and go through all docs. My point is that it would be much faster if it could run filters one by one and if each next filter would work not with the entire data set but with results returned from the previous filter. Also tried cost >= 100 to apply a filter as a post filter, but nothing changes, same 7 seconds. Filter cache doesn't help here. So this: >> By design, fq clauses like this are calculated for the entire document set >> and the results cached, there is no "ordering" for that part. doesn't sound right to me. Sometimes we don't need to reuse filters (and sometimes even can't, e.g. the cost option requires cache=false). In the provided use case the way Solr applies filters is more harmful than useful. I'd even say more than 600 times harmful. The query that wouldn't take more than a second in MySQL takes 15 seconds in a search engine that uses rapid SSD RAID 10, has a few shards and replicas, uses more that 160G of memory in total and has ≈40 CPU cores. Thus this sounds like a feature leak (at least). Please share your thoughts on this. > Query filters applied in a wrong order > -------------------------------------- > > Key: SOLR-6494 > URL: https://issues.apache.org/jira/browse/SOLR-6494 > Project: Solr > Issue Type: Bug > Affects Versions: 4.8.1 > Reporter: Alexander S. > > This query: > {code} > { > fq: ["type:Award::Nomination"], > sort: "score desc", > start: 0, > rows: 20, > q: "*:*" > } > {code} > takes just a few milliseconds, but this one: > {code} > { > fq: [ > "type:Award::Nomination", > "created_at_d:[* TO 2014-09-08T23:59:59Z]" > ], > sort: "score desc", > start: 0, > rows: 20, > q: "*:*" > } > {code} > takes almost 15 seconds. > I have just ≈12k of documents with type "Award::Nomination", but around half > a billion with created_at_d field set. And it seems Solr applies the > created_at_d filter first going through all documents where this field is > set, which is not very smart. > I think if it can't do anything better than applying filters in the alphabet > order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org