[
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790546#comment-14790546
]
Dennis Gove edited comment on SOLR-7584 at 9/16/15 3:23 PM:
------------------------------------------------------------
This supports joining any incoming set of streams. If you have a FacetStream
instance (SOLR-7903) then you could absolutely join it with some other stream
instance.
Due to current use of merge-join style it is a requirement that the incoming
streams be sorted in a similar order. That said, a hash-join style can
relatively easily be added in which case the ordering requirement will go away.
I think a hash-join would make a lot of sense for a FacetStream (or really any
kind of aggregation stream).
The result of the join is just another stream so you can then feed that into
any other stream for further processing (including aggregation for functions
like sum and avg).
was (Author: dpgove):
This supports joining any incoming set of streams. If you have a FacetStream
instance (SOLR-7903) then you could absolutely join it with some other stream
instance.
Due to current use of merge-join style it is a requirement that the incoming
streams be sorted in a similar order. That said, a hash-join style can
relatively easily be added in which case the ordering requirement will go away.
I think a hash-join would make a lot of sense for a FacetStream (or really any
kind of aggregation stream).
Using the feature added in SOLR-7669 (Add SelectStream to Streaming API) you
will be able to apply functions (called operations in that ticket) on the
joined data. Currently the only included operation
> Add Joins to the Streaming API and Streaming Expressions
> --------------------------------------------------------
>
> Key: SOLR-7584
> URL: https://issues.apache.org/jira/browse/SOLR-7584
> Project: Solr
> Issue Type: Improvement
> Components: SolrJ
> Reporter: Dennis Gove
> Priority: Minor
> Labels: Streaming
> Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch,
> SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the
> Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
> search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
> search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
> on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
> search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
> search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
> on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the
> default merge join but I haven't yet figured out the best way to express
> that. I'd like to let the user tell us which sub-stream should be hashed (the
> least-cost one).
> Also, I've been thinking about field aliasing and might want to add a
> SelectStream which serves the purpose of allowing us to limit the fields
> coming out and rename fields.
> Depends on SOLR-7554
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]