[ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348347#comment-15348347
 ] 

Joel Bernstein commented on SOLR-9240:
--------------------------------------

After reviewing the TopicStream I found that it already supports the 
partitionKeys parameter because it's using the SolrStream under the covers, 
where the partitioning logic resides. 

So all that needs to be done on this ticket is to add parallel(topic()) test 
cases and fix any issues that arise.

> Add partitionKeys parameter to the topic() Streaming Expression
> ---------------------------------------------------------------
>
>                 Key: SOLR-9240
>                 URL: https://issues.apache.org/jira/browse/SOLR-9240
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>
> Currently the topic() function doesn't accept a partitionKeys parameter like 
> the search() function does. This means the topic() function can't be wrapped 
> by the parallel() function to run across worker nodes.
> It would be useful to support parallelizing the topic function because it 
> would provide a general purpose parallelized approach for processing batches 
> of data as they enter the index.
> For example this would allow a classify() function to be wrapped around a 
> topic() function to classify documents in parallel across worker nodes. 
> Sample syntax:
> {code}
> parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
> {code}
> The example above would send a daemon to worker nodes that would classify all 
> new documents returned by the topic() function. The update function would 
> send the output of classify() to a SolrCloud collection for indexing.
> The partitionKeys parameter would ensure that each worker would receive a 
> partition of the results returned by the topic() function. This allows the 
> classify() function to be run in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to