linliu-code opened a new issue, #18228: URL: https://github.com/apache/hudi/issues/18228
### Feature Description **What the feature achieves:** Adds support for ingesting data from AWS Kinesis Data Streams into Hudi tables via DeltaStreamer. Users can stream JSON records from Kinesis into Hudi with checkpointing, multi-shard reads, and incremental ingestion, in line with existing Kafka/DFS sources. **Why this feature is needed:** Kinesis is widely used for real-time streaming on AWS. Teams using Kinesis and Hudi currently need custom code or separate pipelines to move Kinesis data into the lakehouse. This creates duplication, inconsistency, and extra maintenance. Native Kinesis support in DeltaStreamer gives a single, well-tested path from Kinesis to Hudi, with checkpointing and support for upserts and other Hudi operations. ### User Experience **How users will use this feature:** Users configure DeltaStreamer with JsonKinesisSource and set Kinesis options (stream name, region, endpoint, starting position) in a properties file. They run DeltaStreamer as usual; it consumes records from Kinesis, writes them to a Hudi table using the chosen write operation (e.g., UPSERT), and tracks offsets for resumable, incremental ingestion across restarts. ### Hudi RFC Requirements **RFC PR link:** (if applicable) **Why RFC is/isn't needed:** - Does this change public interfaces/APIs? (No) - Does this change storage format? (No) - Justification: Small feature -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
