[ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790545#comment-14790545
 ] 

Dennis Gove commented on SOLR-7584:
-----------------------------------

This supports joining any incoming set of streams. If you have a FacetStream 
instance (SOLR-7903) then you could absolutely join it with some other stream 
instance. 

Due to current use of merge-join style it is a requirement that the incoming 
streams be sorted in a similar order. That said, a hash-join style can 
relatively easily be added in which case the ordering requirement will go away. 
I think a hash-join would make a lot of sense for a FacetStream (or really any 
kind of aggregation stream).

Using the feature added in SOLR-7669 (Add SelectStream to Streaming API) you 
will be able to apply functions (called operations in that ticket) on the 
joined data. Currently the only included operation

> Add Joins to the Streaming API and Streaming Expressions
> --------------------------------------------------------
>
>                 Key: SOLR-7584
>                 URL: https://issues.apache.org/jira/browse/SOLR-7584
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>            Reporter: Dennis Gove
>            Priority: Minor
>              Labels: Streaming
>         Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, 
> SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
> Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the 
> default merge join but I haven't yet figured out the best way to express 
> that. I'd like to let the user tell us which sub-stream should be hashed (the 
> least-cost one).
> Also, I've been thinking about field aliasing and might want to add a 
> SelectStream which serves the purpose of allowing us to limit the fields 
> coming out and rename fields.
> Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to