[jira] [Updated] (SOLR-6526) Solr Streaming API

Joel Bernstein (JIRA) Thu, 18 Sep 2014 05:48:07 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-6526:
---------------------------------
    Description: 
It would be great if there was a SolrJ library that could connect to Solr's 
/export handler (SOLR-5244) and perform streaming operations on the sorted 
result sets.

This ticket defines the base interfaces and implementations for the Streaming 
API. The base API contains three classes:

*SolrStream*: This represents a stream from a single Solr instance. It speaks 
directly to the /export handler and provides methods to read() Tuples and 
close() the stream

*CloudSolrStream*: This represents a stream from a SolrCloud collection. It 
speaks with Zk to discover the Solr instances in the collection and then 
creates SolrStreams to make the requests. The results from the underlying 
streams are merged inline to produce a single sorted stream of tuples.

*Tuple*: The data structure returned by the read() method of the SolrStream 
API. It is nested to support grouping and Cartesian product set operations.

Once these base classes are implemented it paves the way for building 
*Decorator* streams that perform operations on the sorted Tuple sets. For 
example:

{code}
//Create three CloudSolrStreams to different solr cloud clusters. They could be 
anywhere in the world.

SolrStream stream1 = new CloudSolrStream(zkUrl1, queryRequest1, "a"); // Alias 
this stream as "a"
SolrStream stream2 = new CloudSolrStream(zkUrl2, queryRequest2, "b"); // Alias 
this stream as "b"
SolrStream stream3 = new CloudSolrStream(zkUrl3, queryRequest3, "c"); // Alias 
this stream as "c"

// Merge Join stream1 and stream2 using a comparator to compare tuples.

MergeJoinStream joinStream1 = new MergeJoinStream(stream1, stream2, new 
MyComp());

//Hash join the tuples from the joinStream1 with stream3 the HashKey()'s define 
the hashKeys for tuples 
HashJoinStream joinStream2 = new HashJoinStream(joinStream1,stream3, new 
HashKey(), new HashKey());

//Sum field1 from 
SumStream sumStream1 = new SumStream(joinStream2, "a.field1");
SumStream sumStream2 = new SumStream(sumStream1, "b.field2");
Tuple t = null;

//Read from the stream until it's finished.
while((t != sumStream2().read()) != null);

//Get the sums from the joined data.

long sum1 = sumStream1.getSum();
long sum2 = sumStream2.getSum();

{code}


  was:
It would be great if there was a SolrJ library that could connect to Solr's 
/export handler (SOLR-5244) and perform streaming operations on the sorted 
result sets.

This ticket defines the base interfaces and implementations for the Streaming 
API. The base API contains three classes:

*SolrStream*: This represents a stream from a single Solr instance. It speaks 
directly to the /export handler and provides methods to read() Tuples and 
close() the stream

*CloudSolrStream*: This represents a stream from a SolrCloud collection. It 
speaks with Zk to discover the Solr instances in the collection and then 
creates SolrStreams to make the requests. The results from the underlying 
streams are merged inline to produce a single sorted stream of tuples.

*Tuple*: The data structure returned by the read() method of the SolrStream 
API. It is nested to support grouping and Cartesian product set operations.

Once these base classes are implemented it paves the way for building 
*Decorator* streams that perform operations on the sorted Tuple sets. For 
example:

{code}
//Create three CloudSolrStreams to different solr cloud clusters. They could be 
anywhere in the world.

SolrStream stream1 = new CloudSolrStream(zkUrl1, queryRequest1, "a"); // Alias 
this stream as "a"
SolrStream stream2 = new CloudSolrStream(zkUrl2, queryRequest2, "b"); // Alias 
this stream as "b"
SolrStream stream3 = new CloudSolrStream(zkUrl3, queryRequest3, "c"); // Alias 
this stream as "c"

// Merge Join stream1 and stream2 using a comparator to compare tuples.

MergeJoinStream joinStream1 = new MergeJoinStream(stream1, stream2, new 
MyComp());

//Hash join the tuples from the joinStream1 with stream3 the HashKey()'s define 
the hashKeys for tuples 
HashJoinStream joinStream2 = new HashJoinStream(joinStream1,stream3, new 
HashKey(), new HashKey());

//Sum field1 from 
SumStream sumStream1 = new SumStream(joinStream2, "a.field1");
AveStream sumStream2 = new SumStream(sumStream1, "b.field2");
Tuple t = null;

//Read from the stream until it's finished.
while((t != sumStream2().read()) != null);

//Get the sums from the joined data.

long sum1 = sumStream1.getSum();
long sum2 = sumStream2.getSum();

{code}



> Solr Streaming API
> ------------------
>
>                 Key: SOLR-6526
>                 URL: https://issues.apache.org/jira/browse/SOLR-6526
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Joel Bernstein
>             Fix For: 6.0
>
>         Attachments: SOLR-6526.patch
>
>
> It would be great if there was a SolrJ library that could connect to Solr's 
> /export handler (SOLR-5244) and perform streaming operations on the sorted 
> result sets.
> This ticket defines the base interfaces and implementations for the Streaming 
> API. The base API contains three classes:
> *SolrStream*: This represents a stream from a single Solr instance. It speaks 
> directly to the /export handler and provides methods to read() Tuples and 
> close() the stream
> *CloudSolrStream*: This represents a stream from a SolrCloud collection. It 
> speaks with Zk to discover the Solr instances in the collection and then 
> creates SolrStreams to make the requests. The results from the underlying 
> streams are merged inline to produce a single sorted stream of tuples.
> *Tuple*: The data structure returned by the read() method of the SolrStream 
> API. It is nested to support grouping and Cartesian product set operations.
> Once these base classes are implemented it paves the way for building 
> *Decorator* streams that perform operations on the sorted Tuple sets. For 
> example:
> {code}
> //Create three CloudSolrStreams to different solr cloud clusters. They could 
> be anywhere in the world.
> SolrStream stream1 = new CloudSolrStream(zkUrl1, queryRequest1, "a"); // 
> Alias this stream as "a"
> SolrStream stream2 = new CloudSolrStream(zkUrl2, queryRequest2, "b"); // 
> Alias this stream as "b"
> SolrStream stream3 = new CloudSolrStream(zkUrl3, queryRequest3, "c"); // 
> Alias this stream as "c"
> // Merge Join stream1 and stream2 using a comparator to compare tuples.
> MergeJoinStream joinStream1 = new MergeJoinStream(stream1, stream2, new 
> MyComp());
> //Hash join the tuples from the joinStream1 with stream3 the HashKey()'s 
> define the hashKeys for tuples 
> HashJoinStream joinStream2 = new HashJoinStream(joinStream1,stream3, new 
> HashKey(), new HashKey());
> //Sum field1 from 
> SumStream sumStream1 = new SumStream(joinStream2, "a.field1");
> SumStream sumStream2 = new SumStream(sumStream1, "b.field2");
> Tuple t = null;
> //Read from the stream until it's finished.
> while((t != sumStream2().read()) != null);
> //Get the sums from the joined data.
> long sum1 = sumStream1.getSum();
> long sum2 = sumStream2.getSum();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-6526) Solr Streaming API

Reply via email to