[
https://issues.apache.org/jira/browse/SOLR-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234151#comment-15234151
]
Dennis Gove commented on SOLR-8962:
-----------------------------------
That sounds like a good algorithm for this to support.
Independently of this I've been looking at creating a PartitionStream that can
be used to partition streams off to different workers. Where this differs from
a ParallelStream is that it can partition streams in the middle of a pipeline
(ie, after a join). The biggest hangup I've had on it is how best to express a
PartitionStream but I'm fairly confident I've come up with a good solution.
A PartitionStream could be used to do mergesort fork/join across different
workers which would be helpful in situations where the dataset is too large for
a single process to realistically handle.
> Add sort Streaming Expression
> -----------------------------
>
> Key: SOLR-8962
> URL: https://issues.apache.org/jira/browse/SOLR-8962
> Project: Solr
> Issue Type: New Feature
> Reporter: Joel Bernstein
> Priority: Critical
> Fix For: 6.1
>
> Attachments: SOLR-8962.patch
>
>
> The sort Streaming Expression does an in memory sort of the Tuples returned
> by it's underlying stream. This is intended to be used for sorting sets
> gathered during local graph traversals. This will make it easy to gather sets
> during a traversal and use all of the sort based set operations (merge,
> innerJoin, outerJoin, reduce, complement, intersect).
> This will be particularly useful with the gatherNodes expression (SOLR-8925).
> Sample syntax:
> {code}
> intersect(
> sort(gatherNodes(...), "fieldA asc"),
> sort(gatherNodes(...), "fieldA asc"),
> on)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]