[jira] [Commented] (SOLR-12635) HashQParserPlugin should be run as a post filter cost is not explicitly defined

Varun Thacker (JIRA) Tue, 07 Aug 2018 19:43:15 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572588#comment-16572588
 ]


Varun Thacker commented on SOLR-12635:
--------------------------------------

Here are some thoughts about using the HashQParser after speaking to Joel 
offline

We almost always want to use HashQParser to fetch a lot of data in parallel.

Now that sentence can be interpreted in two ways

Thought 1 - If we want to parallelize fetching data each 1/N stream won't be 
big. So a post filter approach makes sense.

Thought 2 - We are using parallel because the data is big and 1/N will also be 
big. The HashQParser is very cache friendly i.e once executed the following 
query will always be able to leverage the filterCache/queryResultCache and 
serve the query very fast. Pay the cost for the first time the query get's 
executed and then the query will be super fast. We could even avoid paying the 
cost for the first query by adding these 6 queries in the newSearcher event in 
your solrconfig.xml file 
{code:java}
<listener event="newSearcher" class="solr.QuerySenderListener">
  <arr name="queries">

    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=0}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=1}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=2}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=3}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=4}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 
worker=5}</str><str name="partitionKeys">myPartitionKey</str></lst>
  </arr>
</listener>{code}
 

I'm going to ponder on this a little more but I'm tempted to go with the second 
school of thought . This would involve no changes to the code just adding this 
to the solrconfig.xml file . 

> HashQParserPlugin should be run as a post filter cost is not explicitly 
> defined
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-12635
>                 URL: https://issues.apache.org/jira/browse/SOLR-12635
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-12635.patch
>
>
> I was doing some performance benchmarking for a user on slow streaming queries
> The weird thing was that same streaming expression was fast when we fired it 
> again
> We were able to isolate the slowness to hash query parser
> Here is the first and second time we fired the query - to simplify things 
> this is for one shard and for the same worker
> {code:java}
> path=/export 
> params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash 
> workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey 
> asc&wt=javabin&version=2.2} hits=0 status=0 QTime=6821
> path=/export 
> params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash 
> workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey 
> asc&wt=javabin&version=2.2} hits=0 status=0 QTime=0{code}
> Even with hits=0 the first query took 6.8 seconds. The shard has 17m 
> documents 
> The second query utilizes the queryResultCache and hence it's lightening fast 
> the second time around.
> When we execute the same query and add a cost i.e {{&fq={!hash workers=6 
> worker=3}} cost=101} the query get's executed as a post filter and even 
> uncashed is super fast.
> I created this Jira so that we can always set cost > 100 from the parallel 
> stream.
> However I am happy to change the default behaviour for HashQParserPlugin and 
> make it run as a post filter always unless explicitly specified. 
> CollapsingQParserPlugin does this currently to make sure it's run as a post 
> filter by default
> {code:java}
> public int getCost() {
>   return Math.max(super.getCost(), 100);
> }{code}
> Thoughts anyone? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12635) HashQParserPlugin should be run as a post filter cost is not explicitly defined

Reply via email to