Sagar Sumit created HUDI-2742:
---------------------------------

             Summary: Multiple S3EventsHoodieIncrSource from same S3 metadata 
table for different Hudi tables
                 Key: HUDI-2742
                 URL: https://issues.apache.org/jira/browse/HUDI-2742
             Project: Apache Hudi
          Issue Type: Sub-task
            Reporter: Sagar Sumit


Use case:
Let's say you have a source bucket which has different folders: a1, a2, a3.
All write events on this bucket are being logged to the single 
s3_metadata_table.
Now you want to run 3 S3EventsHoodieIncrSource for each of a1, a2, a3 pulling 
metadata from the same s3_metadata_table.
And this should be done ensuring that no two incr sources are ingesting to the 
same table i.e. there should be strict separation.

Proposed Solution:
users can provide a filter key value and they can start multiple incr sources 
with different configs. In the above use case key could be s3.object.key and 
value could be regex that matches upto a certain part of s3 object key. We 
apply filter in S3EventsHoodieIncrSource 
[here|https://github.com/apache/hudi/blob/6b93ccca9b26b47099e9791d4363e0616e77e408/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java#L105-L109].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to