[ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890736#comment-15890736
 ] 

Hoss Man commented on SOLR-10059:
---------------------------------

SOme historical context here is that when "distributed search" was first added, 
before there was any native "cloud support" the want to trigger a distributed 
search was to specify a list of shard URLs (as a request param) for the 
coordinator node to query & aggregate the responses from.  A common 
configuration pattern was for people to put the shards (URLS) in their handler 
defaults in solrconfig.xml -- but also have a "shards.qt" param that pointed at 
a different handler name. (to some other handler registration w/o the shards 
list) ... alternatively, some people deployed one solrconfig.xml file to the 
nodes that had data one them (and included things like defaults/appends fqs), 
and had completely diff solrconfig.xml for their coordinator nodes that only 
know about the shards param and the list of nodes to aggregate from.

you're definitely correct -- as things evolved into solr cloud, the fact that 
things like appends fqs are being computed multiple times because they come 
from both the coordinator node's init params and the individual shard's 
(identical) init params.

I think the the general approach #2 you suggested makes the most sense ... the 
bit of code (in RequestHandlerBase i believe?) where the 
defaults/invariants/appends are wrapped around/under the request params should 
be skipped in (some) solr cloud shard requests -- but i think checking IS_SHARD 
is really only 1 piece of the puzzle? for completeness we should probably also 
check that the SolrCore says we are in solrcloud mode (to ensure the user isn't 
rolling their own distributed search via pre-solrcloud shard requests like i 
described above)

the only other thing to worry about i guess is what should happen when 
multi-collection requests are issued? -- such as when a collection alias points 
to multiple collections.  Shouldn't the "appends" FQ params from collection1 be 
applied anytime a query includes collection1, and the appends FQ params from 
collection1 be applied any time a query includes collection2; even if those are 
both a single query that originated via a request to "both_collections" (which 
is an alias for "collection1,collection2") ?

I suppose the coordinating node could include the "source collection (alias)" 
of the request as a param that the individual shards could compare with 
themselves to decide when they need to wrap the params?

(just thinking outloud -- probably a better solution)






> In SolrCloud, every fq added via <lst name="appends"> is computed twice.
> ------------------------------------------------------------------------
>
>                 Key: SOLR-10059
>                 URL: https://issues.apache.org/jira/browse/SOLR-10059
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.4.0
>            Reporter: Marc Morissette
>              Labels: performance
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's <lst name="appends"> are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> <requestHandler name="/myHandler" class="solr.SearchHandler">
>     <lst name="appends">
>         <str name="fq">{!collapse field=routingKey hint=top_fc}</str>
>     </lst>
> </requestHandler>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to