[
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-4465:
---------------------------------
Attachment: SOLR-4465.patch
> Configurable Collectors
> -----------------------
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 4.1
> Reporter: Joel Bernstein
> Fix For: 4.3
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new
> collectorFactory element. For example:
> <collectorFactory name="default" class="solr.CollectorFactory"/>
> <collectorFactory name="sum" class="solr.SumCollectorFactory"/>
> The elements above define two collector factories. The first one is the
> "default" collectorFactory. The class attribute points to
> org.apache.solr.handler.component.CollectorFactory, which implements logic
> that returns the default TopScoreDocCollector and TopFieldCollector.
> To create your own collectorFactory you must subclass the default
> CollectorFactory and at a minimum override the getCollector method to return
> your new collector.
> You can tell Solr which collectorFactory to use at query time using http
> parameters. All collector parameters start with the prefix "cl".
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default
> collectorFactory.
> *Pluggable doclist Sorting with Topdocs Collectors*
> You can specify two types of pluggable collectors. The first type is the
> topdocs collector. For example:
> cl.topdocs=<name>
> The above param points to the named collectorFactory in the solrconfig.xml to
> construct the collector. Topdocs collectorFactorys must return collectors
> that extend the TopDocsCollector base class. Topdocs collectors are
> responsible for collecting the doclist.
> You can pass parameters to the topdocs collectors by adding "cl." http
> parameters. By convention you can pass parameters to the topdocs collector
> like this:
> cl.topdocs.max=100
> This parameter will be added to the collector spec because of the "cl."
> prefix and passed to the collectorFactory.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of delegating collectors with the
> "cl.delegating" parameter. Delegating collectors are designed to collect
> something else besides the doclist. Typically this would be some type of
> custom analytic.
> cl.delegating=sum,ave
> The parameter above specifies two delegating collectors named sum and ave.
> Like the topdocs collectors these point to named collectorFactories in the
> solrconfig.xml.
> Delegating collector factories must return Collector instances that extend
> DelegatingCollector.
> A sample delegating collector is provided in the patch through the
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups
> by a field and sums a column of floats. The sum collector is not designed to
> be a fully functional sum function but to be a proof of concept for pluggable
> analytics through delegating collectors.
> To communicate with delegating collectors you need to reference the name and
> ordinal of the collector.
> The ordinal refers to the collectors ordinal in the comma separated list.
> For example:
> cl.delegating=sum,ave&cl.sum.0.groupby=field1
> The "cl.sum.0.groupy" parameter tells the "sum" collector at the 0 ordinal to
> group by "field1".
> Delegating collectors are passed a reference to the ResponseBuilder and can
> place maps with analytic output directory into the SolrQueryResponse with the
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the
> outgoing response.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates
> the results from each of the shards during distributed search. The "default"
> CollectoryFactory implements the default merge logic for merging documents
> from each shard. If you define a different topdocs collector you may need to
> change the default merge method to merge documents in accordance with how
> they are being collected at the shard level.
> With delegating collectors, you'll need to overide the merge method to merge
> the analytic outputs from the shards. An example of how this works is provide
> in the SumCollectorFactory.
> Each collectorFactory, that is specified in the http parameters, will have
> its merge method applied by the aggregator.
> *Testing the Patch With Sample Data*
> 1) Apply patch to Solr 4.1
> 2) Load sample data
> 3) Send the http command:
> http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true&cl=true&cl.topdocs=default&cl.delegating=sum&cl.sum.0.groupby=manu_id_s&cl.sum.0.column=price
> The doclist will be generated by the "default" topdocs collector and the
> output will include a map named "cl.sum.0" which will have output from the
> delegating sum collector.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]