[jira] [Commented] (SOLR-6526) Solr Streaming API

Joel Bernstein (JIRA) Wed, 17 Sep 2014 09:24:54 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137481#comment-14137481
 ]


Joel Bernstein commented on SOLR-6526:
--------------------------------------

Also, it's important to keep in mind that existing Solr clients will simply 
will run out of memory if they pull millions of records. They were built for a 
specific use case that involves bringing back pages of results.

The export handler was built to export and sort millions of results. So there 
is a basic mis-match between how the existing clients operate and how /export 
handler operates. The use cases a fundamentally different. If you want to 
return results pages you just use Solr's regular search.

The Streaming API though could apply to normal Solr searchs and /export'ed 
result sets so it makes sense to bring them inline.

> Solr Streaming API
> ------------------
>
>                 Key: SOLR-6526
>                 URL: https://issues.apache.org/jira/browse/SOLR-6526
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Joel Bernstein
>             Fix For: 5.0
>
>         Attachments: SOLR-6526.patch
>
>
> It would be great if there was a SolrJ library that could connect to Solr's 
> /export handler (SOLR-5244) and perform streaming operations on the sorted 
> result sets.
> This ticket defines the base interfaces and implementations for the Streaming 
> API. The base API contains three classes:
> *SolrStream*: This represents a stream from a single Solr instance. It speaks 
> directly to the /export handler and provides methods to read() Tuples and 
> close() the stream
> *CloudSolrStream*: This represents a stream from a SolrCloud collection. It 
> speaks with Zk to discover the Solr instances in the collection and then 
> creates SolrStreams to make the requests. The results from the underlying 
> streams are merged inline to produce a single sorted stream of tuples.
> *Tuple*: The data structure returned by the read() method of the SolrStream 
> API. It is nested to support grouping and Cartesian product set operations.
> Once these base classes are implemented it paves the way for building 
> *Decorator* streams that perform operations on the sorted Tuple sets. For 
> example a CollapseStream could be created:
> {code}
> CollapseStream collapseStream = new CollapseStream(new CloudSolrStream(zkUrl, 
> queryRequest));
> Tuple tuple = null;
> while((tuple = collapseStream.read()) != null) {
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6526) Solr Streaming API

Reply via email to