linliu-code opened a new issue, #18228:
URL: https://github.com/apache/hudi/issues/18228

   ### Feature Description
   
   **What the feature achieves:**
   Adds support for ingesting data from AWS Kinesis Data Streams into Hudi 
tables via DeltaStreamer. Users can stream JSON records from Kinesis into Hudi 
with checkpointing, multi-shard reads, and incremental ingestion, in line with 
existing Kafka/DFS sources.
   
   **Why this feature is needed:**
   Kinesis is widely used for real-time streaming on AWS. Teams using Kinesis 
and Hudi currently need custom code or separate pipelines to move Kinesis data 
into the lakehouse. This creates duplication, inconsistency, and extra 
maintenance. Native Kinesis support in DeltaStreamer gives a single, 
well-tested path from Kinesis to Hudi, with checkpointing and support for 
upserts and other Hudi operations.
   
   
   ### User Experience
   
   **How users will use this feature:**
   Users configure DeltaStreamer with JsonKinesisSource and set Kinesis options 
(stream name, region, endpoint, starting position) in a properties file. They 
run DeltaStreamer as usual; it consumes records from Kinesis, writes them to a 
Hudi table using the chosen write operation (e.g., UPSERT), and tracks offsets 
for resumable, incremental ingestion across restarts.
   
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (if applicable)
   
   **Why RFC is/isn't needed:**
   - Does this change public interfaces/APIs? (No)
   - Does this change storage format? (No)
   - Justification:
   Small feature
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to