[GitHub] [hudi] rmahindra123 opened a new pull request #3592: [HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data

GitBox Fri, 03 Sep 2021 00:07:57 -0700


rmahindra123 opened a new pull request #3592:
URL: https://github.com/apache/hudi/pull/3592



   ## What is the purpose of the pull request
   
   Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data. This PR 
enables connect users to readily ingest Kafka AVRO/ JSON string records into 
Hudi tables without Spark engine, within the Kafka Connect framework.
   
   Currently, we use the HoodieJavaWriteClient's bulk insert support to insert 
append only data (CoW). We use file id indexing to ensure multiple writers per 
Kafka partition can write to the same Hudi partition path concurrently without 
locks. 
   
   ## Brief change log
   
   1. The Kafka connect protocol is implemented in a new package, 
hudi-kafka-connect
   2. A few code changes to integrate support for bulk insert with 
HoodieJavaWriteClient. 
   
   ## Verify this pull request
   
   1. Wrote unit tests for the key Coorindator <-> Participants interaction.
   2. Tested with the kafka console connect in distributed mode as per 
instructions in README.md
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rmahindra123 opened a new pull request #3592: [HUDI-2394] Implement Kafka Sink Protocol for Hudi for Ingesting Immutable Data

Reply via email to