[
https://issues.apache.org/jira/browse/SOLR-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137336#comment-14137336
]
Joel Bernstein commented on SOLR-6526:
--------------------------------------
On the server side, you really need to do something very different then the
normal Solr search to support full result set export. The /export handler is
designed to sort and stream millions of results efficiently. An entirely new
sorting engine and export engine needed to be written for this purpose.
In a normal search scenario, you don't need this feature, as normal search
scenarios deal with pages of results. The export handler is not designed for
normal search scenarios. It is designed for scenarios that were typically
handled in an aggregation engine (distributed joins, session analysis etc...).
Having a separate interface for these very important use cases makes perfect
sense.
The Streaming API is designed to be an elegant API for performing set
operations (merges, joins, collapses) on large distributed result sets. This is
also an entirely different use case then existing Solrj libraries which was
designed for traditional search needs.
> Solr Streaming API
> ------------------
>
> Key: SOLR-6526
> URL: https://issues.apache.org/jira/browse/SOLR-6526
> Project: Solr
> Issue Type: New Feature
> Components: clients - java
> Reporter: Joel Bernstein
> Fix For: 5.0
>
> Attachments: SOLR-6526.patch
>
>
> It would be great if there was a SolrJ library that could connect to Solr's
> /export handler (SOLR-5244) and perform streaming operations on the sorted
> result sets.
> This ticket defines the base interfaces and implementations for the Streaming
> API. The base API contains three classes:
> *SolrStream*: This represents a stream from a single Solr instance. It speaks
> directly to the /export handler and provides methods to read() Tuples and
> close() the stream
> *CloudSolrStream*: This represents a stream from a SolrCloud collection. It
> speaks with Zk to discover the Solr instances in the collection and then
> creates SolrStreams to make the requests. The results from the underlying
> streams are merged inline to produce a single sorted stream of tuples.
> *Tuple*: The data structure returned by the read() method of the SolrStream
> API. It is nested to support grouping and Cartesian product set operations.
> Once these base classes are implemented it paves the way for building
> *Decorator* streams that perform operations on the sorted Tuple sets. For
> example a CollapseStream could be created:
> {code}
> CollapseStream collapseStream = new CollapseStream(new CloudSolrStream(zkUrl,
> queryRequest));
> Tuple tuple = null;
> while((tuple = collapseStream.read()) != null) {
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]