[jira] [Resolved] (SOLR-12684) Document speed gotchas and partitionKeys usage for ParallelStream

Varun Thacker (JIRA) Fri, 24 Aug 2018 01:41:44 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Varun Thacker resolved SOLR-12684.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 7.5
                   master (8.0)

Thanks Amrit!

> Document speed gotchas and partitionKeys usage for ParallelStream
> -----------------------------------------------------------------
>
>                 Key: SOLR-12684
>                 URL: https://issues.apache.org/jira/browse/SOLR-12684
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-12684.patch, SOLR-12684.patch, SOLR-12684.patch, 
> SOLR-12684.patch
>
>
> The aim of this Jira is to beef up the ref guide around parallel stream
> There are two things I want to address:
>  
> Firstly usage of partitionKeys :
> This line in the ref guide indicates that parallel stream keys should always 
> be the same as the underlying sort criteria 
> {code:java}
> The parallel function maintains the sort order of the tuples returned by the 
> worker nodes, so the sort criteria of the parallel function must match up 
> with the sort order of the tuples returned by the workers.
> {code}
> But as discussed on SOLR-12635 , Joel provided an example
> {code:java}
> The hash partitioner just needs to send documents to the same worker node. 
> You could do that with just one partitioning key
> For example if you sort on year, month and day. You could partition on year 
> only and still be fine as long as there was enough different years to spread 
> the records around the worker nodes.{code}
> So we should make this more clear in the ref guide.
> Let's also document that specifying more than 4 partitionKeys will throw an 
> error after SOLR-12683
>  
> At this point the user will understand how to use partitonKeys . It's related 
> to the sort criteria but should not have all the sort fields 
>  
> We should now mention a trick where the user could warn up the hash queries 
> as they are always run on the whole document set ( irrespective of the filter 
> criterias )
> also users should only use parallel when the docs matching post filter 
> criterias is very large .  
> {code:java}
> <listener event="newSearcher" class="solr.QuerySenderListener">
> <arr name="queries">
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=0}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=1}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=2}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=3}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=4}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=5}</str><str 
> name="partitionKeys">myPartitionKey</str></lst>
> </arr>
> </listener>{code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-12684) Document speed gotchas and partitionKeys usage for ParallelStream

Reply via email to