[
https://issues.apache.org/jira/browse/HUDI-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409315#comment-17409315
]
ASF GitHub Bot commented on HUDI-2394:
--------------------------------------
rmahindra123 opened a new pull request #3592:
URL: https://github.com/apache/hudi/pull/3592
## What is the purpose of the pull request
Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data. This PR
enables connect users to readily ingest Kafka AVRO/ JSON string records into
Hudi tables without Spark engine, within the Kafka Connect framework.
Currently, we use the HoodieJavaWriteClient's bulk insert support to insert
append only data (CoW). We use file id indexing to ensure multiple writers per
Kafka partition can write to the same Hudi partition path concurrently without
locks.
## Brief change log
1. The Kafka connect protocol is implemented in a new package,
hudi-kafka-connect
2. A few code changes to integrate support for bulk insert with
HoodieJavaWriteClient.
## Verify this pull request
1. Wrote unit tests for the key Coorindator <-> Participants interaction.
2. Tested with the kafka console connect in distributed mode as per
instructions in README.md
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> [Kafka Connect Mileston 1] Implement kafka connect for immutable data
> ---------------------------------------------------------------------
>
> Key: HUDI-2394
> URL: https://issues.apache.org/jira/browse/HUDI-2394
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Rajesh Mahindra
> Priority: Major
>
> Implement kafka connect for immutable data using Bulk inserts
--
This message was sent by Atlassian Jira
(v8.3.4#803005)