[jira] [Commented] (SOLR-12635) HashQParserPlugin should be run as a post filter cost is not explicitly defined

Joel Bernstein (JIRA) Wed, 08 Aug 2018 03:59:08 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573018#comment-16573018
 ]


Joel Bernstein commented on SOLR-12635:
---------------------------------------

The postFilter will have to be applied to every document that matches the main 
query and filter queries, and the postFilter is never cached. For large exports 
the postFilter would be perform poorly every time.

For small result sets the best approach is simply to not use the parallel 
function so the hash partitioning is not needed.

When not using the postFilter the hashQparser plugin will automatically be 
cached in the filter cache and can be auto-warmed so queries don't take a 
performance hit when a new searcher is opened. The firstSearch listener can be 
used to perform the initial warming as [~varunthacker] mentions above.

 

> HashQParserPlugin should be run as a post filter cost is not explicitly 
> defined
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-12635
>                 URL: https://issues.apache.org/jira/browse/SOLR-12635
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-12635.patch
>
>
> I was doing some performance benchmarking for a user on slow streaming queries
> The weird thing was that same streaming expression was fast when we fired it 
> again
> We were able to isolate the slowness to hash query parser
> Here is the first and second time we fired the query - to simplify things 
> this is for one shard and for the same worker
> {code:java}
> path=/export 
> params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash 
> workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey 
> asc&wt=javabin&version=2.2} hits=0 status=0 QTime=6821
> path=/export 
> params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash 
> workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey 
> asc&wt=javabin&version=2.2} hits=0 status=0 QTime=0{code}
> Even with hits=0 the first query took 6.8 seconds. The shard has 17m 
> documents 
> The second query utilizes the queryResultCache and hence it's lightening fast 
> the second time around.
> When we execute the same query and add a cost i.e {{&fq={!hash workers=6 
> worker=3}} cost=101} the query get's executed as a post filter and even 
> uncashed is super fast.
> I created this Jira so that we can always set cost > 100 from the parallel 
> stream.
> However I am happy to change the default behaviour for HashQParserPlugin and 
> make it run as a post filter always unless explicitly specified. 
> CollapsingQParserPlugin does this currently to make sure it's run as a post 
> filter by default
> {code:java}
> public int getCost() {
>   return Math.max(super.getCost(), 100);
> }{code}
> Thoughts anyone? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12635) HashQParserPlugin should be run as a post filter cost is not explicitly defined

Reply via email to