[ https://issues.apache.org/jira/browse/SOLR-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Thacker resolved SOLR-12684. ---------------------------------- Resolution: Fixed Fix Version/s: 7.5 master (8.0) Thanks Amrit! > Document speed gotchas and partitionKeys usage for ParallelStream > ----------------------------------------------------------------- > > Key: SOLR-12684 > URL: https://issues.apache.org/jira/browse/SOLR-12684 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > Assignee: Varun Thacker > Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: SOLR-12684.patch, SOLR-12684.patch, SOLR-12684.patch, > SOLR-12684.patch > > > The aim of this Jira is to beef up the ref guide around parallel stream > There are two things I want to address: > > Firstly usage of partitionKeys : > This line in the ref guide indicates that parallel stream keys should always > be the same as the underlying sort criteria > {code:java} > The parallel function maintains the sort order of the tuples returned by the > worker nodes, so the sort criteria of the parallel function must match up > with the sort order of the tuples returned by the workers. > {code} > But as discussed on SOLR-12635 , Joel provided an example > {code:java} > The hash partitioner just needs to send documents to the same worker node. > You could do that with just one partitioning key > For example if you sort on year, month and day. You could partition on year > only and still be fine as long as there was enough different years to spread > the records around the worker nodes.{code} > So we should make this more clear in the ref guide. > Let's also document that specifying more than 4 partitionKeys will throw an > error after SOLR-12683 > > At this point the user will understand how to use partitonKeys . It's related > to the sort criteria but should not have all the sort fields > > We should now mention a trick where the user could warn up the hash queries > as they are always run on the whole document set ( irrespective of the filter > criterias ) > also users should only use parallel when the docs matching post filter > criterias is very large . > {code:java} > <listener event="newSearcher" class="solr.QuerySenderListener"> > <arr name="queries"> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=0}</str><str > name="partitionKeys">myPartitionKey</str></lst> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=1}</str><str > name="partitionKeys">myPartitionKey</str></lst> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=2}</str><str > name="partitionKeys">myPartitionKey</str></lst> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=3}</str><str > name="partitionKeys">myPartitionKey</str></lst> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=4}</str><str > name="partitionKeys">myPartitionKey</str></lst> > <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=5}</str><str > name="partitionKeys">myPartitionKey</str></lst> > </arr> > </listener>{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org