[
https://issues.apache.org/jira/browse/HUDI-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sagar Sumit updated HUDI-2742:
------------------------------
Priority: Blocker (was: Major)
> Multiple S3EventsHoodieIncrSource from same S3 metadata table for different
> Hudi tables
> ---------------------------------------------------------------------------------------
>
> Key: HUDI-2742
> URL: https://issues.apache.org/jira/browse/HUDI-2742
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Sagar Sumit
> Priority: Blocker
> Labels: pull-request-available
>
> Use case:
> Let's say you have a source bucket which has different folders: a1, a2, a3.
> All write events on this bucket are being logged to the single
> s3_metadata_table.
> Now you want to run 3 S3EventsHoodieIncrSource for each of a1, a2, a3 pulling
> metadata from the same s3_metadata_table.
> And this should be done ensuring that no two incr sources are ingesting to
> the same table i.e. there should be strict separation.
> Proposed Solution:
> users can provide a filter key value and they can start multiple incr sources
> with different configs. In the above use case key could be s3.object.key and
> value could be regex that matches upto a certain part of s3 object key. We
> apply filter in S3EventsHoodieIncrSource
> [here|https://github.com/apache/hudi/blob/6b93ccca9b26b47099e9791d4363e0616e77e408/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java#L105-L109].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)