[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5244:
---------------------------------

    Description: 
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that will typically 
done in using aggregation engines like Hadoop. For example


  was:
It would be great if Solr could efficiently export entire search result sets 
without scoring or ranking documents. This would allow external systems to 
perform rapid bulk imports from Solr. It also provides a possible platform for 
exporting results to support distributed join scenarios within Solr.

This ticket provides a patch that has two pluggable components:

1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
document results and does not delegate to ranking collectors. Instead it puts 
the BitSet on the request context.

2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
the entire result as a binary stream. A header is provided at the beginning of 
the stream so external clients can self configure.

Note:
These two components will be sufficient for a non-distributed environment. 
For distributed export a new Request handler will need to be developed.

After applying the patch and building the dist or example, you can register the 
components through the following changes to solrconfig.xml

Register export contrib libraries:

<lib dir="../../../dist/" regex="solr-export-\d.*\.jar" />
 
Register the "export" queryParser with the following line:
 
<queryParser name="export" class="org.apache.solr.export.ExportQParserPlugin"/>
 
Register the "xbin" writer:
 
<queryResponseWriter name="xbin" 
class="org.apache.solr.export.BinaryExportWriter"/>
 
The following query will perform the export:
{code}
http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
{code}

Initial patch supports export of four data-types:

1) Single value trie int, long and float
2) Binary doc values.

The numerics are currently exported from the FieldCache and the Binary doc 
values can be in memory or on disk.

Since this is designed to export very large result sets efficiently, stored 
fields are not used for the export.





> Exporting Full Sorted Result Sets
> ---------------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0, 4.10
>
>         Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch
>
>
> This ticket allows Solr to export full sorted result sets. The proposed 
> syntax is:
> {code}
> q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
> {code}
> Under the covers, the rows=-1 parameter will signal Solr to use the 
> ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
> results. The SortingResponseWriter will sort the results based on the sort 
> criteria and stream the results out.
> This capability will open up Solr for a whole range of uses that will 
> typically done in using aggregation engines like Hadoop. For example



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to