[jira] [Commented] (SOLR-6526) Solr Streaming API

Joel Bernstein (JIRA) Wed, 17 Sep 2014 07:41:55 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137336#comment-14137336
 ]


Joel Bernstein commented on SOLR-6526:
--------------------------------------

On the server side, you really need to do something very different then the 
normal Solr search to support full result set export. The /export handler is 
designed to sort and stream millions of results efficiently. An entirely new 
sorting engine and export engine needed to be written for this purpose. 

In a normal search scenario, you don't need this feature, as normal search 
scenarios deal with pages of results. The export handler is not designed for 
normal search scenarios. It is designed for scenarios that were typically 
handled in an aggregation engine (distributed joins, session analysis etc...).

Having a separate interface for these very important use cases makes perfect 
sense.

The Streaming API is designed to be an elegant API for performing set 
operations (merges, joins, collapses) on large distributed result sets. This is 
also an entirely different use case then existing Solrj libraries which was 
designed for traditional search needs.


> Solr Streaming API
> ------------------
>
>                 Key: SOLR-6526
>                 URL: https://issues.apache.org/jira/browse/SOLR-6526
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Joel Bernstein
>             Fix For: 5.0
>
>         Attachments: SOLR-6526.patch
>
>
> It would be great if there was a SolrJ library that could connect to Solr's 
> /export handler (SOLR-5244) and perform streaming operations on the sorted 
> result sets.
> This ticket defines the base interfaces and implementations for the Streaming 
> API. The base API contains three classes:
> *SolrStream*: This represents a stream from a single Solr instance. It speaks 
> directly to the /export handler and provides methods to read() Tuples and 
> close() the stream
> *CloudSolrStream*: This represents a stream from a SolrCloud collection. It 
> speaks with Zk to discover the Solr instances in the collection and then 
> creates SolrStreams to make the requests. The results from the underlying 
> streams are merged inline to produce a single sorted stream of tuples.
> *Tuple*: The data structure returned by the read() method of the SolrStream 
> API. It is nested to support grouping and Cartesian product set operations.
> Once these base classes are implemented it paves the way for building 
> *Decorator* streams that perform operations on the sorted Tuple sets. For 
> example a CollapseStream could be created:
> {code}
> CollapseStream collapseStream = new CollapseStream(new CloudSolrStream(zkUrl, 
> queryRequest));
> Tuple tuple = null;
> while((tuple = collapseStream.read()) != null) {
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6526) Solr Streaming API

Reply via email to