rmahindra123 commented on a change in pull request #3660: URL: https://github.com/apache/hudi/pull/3660#discussion_r755452809
########## File path: hudi-kafka-connect/README.md ########## @@ -56,30 +50,57 @@ After building the package, we need to install the Apache Kafka ### 1 - Starting the environment -To try out the Connect Sink locally, set up a Kafka broker locally. Download the latest apache kafka from https://kafka.apache.org/downloads. -Once downloaded and built, run the Zookeeper server and Kafka server using the command line tools. +For runtime dependencies, we encourage using the confluent HDFS connector jars. We have tested our setup with version `10.1.0`. +After downloading the connector, copy the jars from the lib folder to the Kafka Connect classpath. ```bash -export KAFKA_HOME=/path/to/kafka_install_dir -cd $KAFKA_KAFKA_HOME -./bin/zookeeper-server-start.sh ./config/zookeeper.properties -./bin/kafka-server-start.sh ./config/server.properties +confluent-hub install confluentinc/kafka-connect-hdfs:10.1.0 +cp confluentinc-kafka-connect-hdfs-10.1.0/lib/*.jars /usr/local/share/java/hudi-kafka-connect/ ``` -Wait until the kafka cluster is up and running. +### 2 - Set up the docker containers -### 2 - Set up the schema registry +To run the connect locally, we need kafka, zookeeper, hdfs, hive etc. To make the setup easier, we use the docker +containers from the hudi docker demo. Refer to [this link for the setup](https://hudi.apache.org/docs/docker_demo) + +Essentially, follow the steps listed here: + +/etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts +```bash +127.0.0.1 adhoc-1 +127.0.0.1 adhoc-2 +127.0.0.1 namenode +127.0.0.1 datanode1 +127.0.0.1 hiveserver +127.0.0.1 hivemetastore +127.0.0.1 kafkabroker +127.0.0.1 sparkmaster +127.0.0.1 zookeeper +``` + +Bring up the docker containers +```bash +cd $HUDI_DIR/docker +./setup_demo.sh +``` + +The schema registry and kafka connector can be run from host system directly (mac/ linux). + +### 3 - Set up the schema registry Hudi leverages schema registry to obtain the latest schema when writing records. While it supports most popular schema registries, we use Confluent schema registry. Download the latest confluent platform and run the schema registry -service. +service. + +NOTE: You might need to change the port from `8081` to `8082`. ```bash cd $CONFLUENT_DIR +/bin/kafka-configs --zookeeper localhost --entity-type topics --entity-name _schemas --alter --add-config cleanup.policy=compact ./bin/schema-registry-start etc/schema-registry/schema-registry.properties Review comment: ack. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
