[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

Joel Bernstein (JIRA) Tue, 12 Jul 2016 08:51:31 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-9240:
---------------------------------
    Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.

{code}
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.






> Support parallel ETL with the topic expression
> ----------------------------------------------
>
>                 Key: SOLR-9240
>                 URL: https://issues.apache.org/jira/browse/SOLR-9240
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>         Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will used and SolrCloud will be treated like a 
> giant message queue filled with data to be processed.
> This ticket makes two small changes to the topic() expression that makes this 
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic 
> can start pulling records from anywhere in the queue.
> Daemons can then be sent to worker nodes that each work on processing a 
> partition of the data from the same topic. The daemon() functions natural 
> behavior is perfect for iteratively calling a topic until all records in the 
> topic have been processed.
> {code}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

Reply via email to