[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743706#comment-16743706
 ] 

mosh commented on SOLR-13125:
-----------------------------

Added a patch which introduces a new query component.
The new query component only runs in cloud mode.
It does the following, in this particular order, and falls back in case any of 
the conditions are not met:

  _Stage Start:_
 # Checks whether an alias is queried.
 # Checks whether the alias is a TRA.
 # Checks whether sort was specified and is sorted by router.field(field TRA 
uses to route docs)

  _Finish Execute Query:_
 # gets all collections in TRA, sorted by date(descending or ascending, depends 
on the sort specified in query).
 # Count responses by collection, and add to whitelist.
 # If limit was not reached, drop requests from shards not in the whitelist.

The patch also contains a new test suite TRAQueryComponentTest,
which tests the behavior of this new SearchComponent.

> Optimize Queries when sorting by router.field
> ---------------------------------------------
>
>                 Key: SOLR-13125
>                 URL: https://issues.apache.org/jira/browse/SOLR-13125
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Minor
>         Attachments: SOLR-13125-no-commit.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to