[GitHub] [hudi] rmahindra123 commented on a change in pull request #3660: [HUDI-2325] Add hive sync support to kafka connect

GitBox Tue, 23 Nov 2021 11:43:58 -0800


rmahindra123 commented on a change in pull request #3660:
URL: https://github.com/apache/hudi/pull/3660#discussion_r755452809




##########
File path: hudi-kafka-connect/README.md
##########
@@ -56,30 +50,57 @@ After building the package, we need to install the Apache 
Kafka
 
 ### 1 - Starting the environment
 
-To try out the Connect Sink locally, set up a Kafka broker locally. Download 
the latest apache kafka from https://kafka.apache.org/downloads.
-Once downloaded and built, run the Zookeeper server and Kafka server using the 
command line tools.
+For runtime dependencies, we encourage using the confluent HDFS connector 
jars. We have tested our setup with version `10.1.0`.
+After downloading the connector, copy the jars from the lib folder to the 
Kafka Connect classpath.
 
 ```bash
-export KAFKA_HOME=/path/to/kafka_install_dir
-cd $KAFKA_KAFKA_HOME
-./bin/zookeeper-server-start.sh ./config/zookeeper.properties
-./bin/kafka-server-start.sh ./config/server.properties
+confluent-hub install confluentinc/kafka-connect-hdfs:10.1.0
+cp confluentinc-kafka-connect-hdfs-10.1.0/lib/*.jars 
/usr/local/share/java/hudi-kafka-connect/
 ```
 
-Wait until the kafka cluster is up and running.
+### 2 - Set up the docker containers
 
-### 2 - Set up the schema registry
+To run the connect locally, we need kafka, zookeeper, hdfs, hive etc. To make 
the setup easier, we use the docker 
+containers from the hudi docker demo. Refer to [this link for the 
setup](https://hudi.apache.org/docs/docker_demo)
+
+Essentially, follow the steps listed here:
+
+/etc/hosts : The demo references many services running in container by the 
hostname. Add the following settings to /etc/hosts
+```bash
+127.0.0.1 adhoc-1
+127.0.0.1 adhoc-2
+127.0.0.1 namenode
+127.0.0.1 datanode1
+127.0.0.1 hiveserver
+127.0.0.1 hivemetastore
+127.0.0.1 kafkabroker
+127.0.0.1 sparkmaster
+127.0.0.1 zookeeper
+```
+
+Bring up the docker containers
+```bash
+cd $HUDI_DIR/docker
+./setup_demo.sh
+```
+
+The schema registry and kafka connector can be run from host system directly 
(mac/ linux).
+
+### 3 - Set up the schema registry
 
 Hudi leverages schema registry to obtain the latest schema when writing 
records. While it supports most popular schema
 registries, we use Confluent schema registry. Download the latest confluent 
platform and run the schema registry
-service.
+service. 
+
+NOTE: You might need to change the port from `8081` to `8082`.
 
 ```bash
 cd $CONFLUENT_DIR
+/bin/kafka-configs --zookeeper localhost --entity-type topics --entity-name 
_schemas --alter --add-config cleanup.policy=compact
 ./bin/schema-registry-start etc/schema-registry/schema-registry.properties

Review comment:
       ack.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rmahindra123 commented on a change in pull request #3660: [HUDI-2325] Add hive sync support to kafka connect

Reply via email to