[
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-9240:
---------------------------------
Description:
It would be useful for Solr to support large scale *Extract, Transform and
Load* use cases with streaming expressions. Instead of using MapReduce for the
ETL, the topic expression will used and SolrCloud will be treated like a giant
message queue filled with data to be processed.
This ticket makes two small changes to the topic() expression that makes this
possible:
1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can
start pulling records from anywhere in the queue.
Daemons can then be sent to worker nodes that each work on processing a
partition of the data from the same topic. The daemon() functions natural
behavior is perfect for iteratively calling a topic until all records in the
topic have been processed.
{code}
{code}
was:
It would be useful for Solr to support large scale *Extract, Transform and
Load* use cases with streaming expressions. Instead of using MapReduce for the
ETL, the topic expression will used and SolrCloud will be treated like a giant
message queue filled with data to be processed.
This ticket makes two small changes to the topic() expression that makes this
possible:
1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can
start pulling records from anywhere in the queue.
Daemons can then be sent to worker nodes that each work on processing a
partition of the data from the same topic. The daemon() functions natural
behavior is perfect for iteratively calling a topic until all records in the
topic have been processed.
> Support parallel ETL with the topic expression
> ----------------------------------------------
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
> Issue Type: Improvement
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and
> Load* use cases with streaming expressions. Instead of using MapReduce for
> the ETL, the topic expression will used and SolrCloud will be treated like a
> giant message queue filled with data to be processed.
> This ticket makes two small changes to the topic() expression that makes this
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic
> can start pulling records from anywhere in the queue.
> Daemons can then be sent to worker nodes that each work on processing a
> partition of the data from the same topic. The daemon() functions natural
> behavior is perfect for iteratively calling a topic until all records in the
> topic have been processed.
> {code}
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]