[ 
https://issues.apache.org/jira/browse/HUDI-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440804#comment-17440804
 ] 

Ethan Guo commented on HUDI-2325:
---------------------------------

Here's the private branch with the whitelisting approach: 
[https://github.com/yihua/hudi/tree/HUDI-2325-kafka-connect-hive-sync]

I followed the confluent guide to install the HDFS 2 sink connector: 
[https://docs.confluent.io/kafka-connect-hdfs/current/overview.html#prerequisites.]

(1) run `confluent-hub install confluentinc/kafka-connect-hdfs:latest`

(2) add the hdfs connector libs to `plugin.path` in 
`connect-distributed.properties`:

`plugin.path=/usr/local/share/java,/Users/joyce/repo/confluent-6.2.1/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/`

When running the Hudi kafka-connect sink, hit the following exception:
{code:java}
[2021-11-08 14:34:07,550] ERROR [hudi-sink|task-1] 
WorkerSinkTask{id=hudi-sink-1} Task threw an uncaught and unrecoverable 
exception. Task is being killed and will not recover until manually restarted 
(org.apache.kafka.connect.runtime.WorkerTask:193)
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
        at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
        at 
org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:308)
        at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
        at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.fs.FSDataInputStream
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 10 more
[2021-11-08 14:34:07,550] ERROR [hudi-sink|task-2] 
WorkerSinkTask{id=hudi-sink-2} Task threw an uncaught and unrecoverable 
exception. Task is being killed and will not recover until manually restarted 
(org.apache.kafka.connect.runtime.WorkerTask:193)
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
        at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
        at 
org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:308)
        at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
        at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.fs.FSDataInputStream
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 10 more {code}

> Implement and test Hive Sync support for Kafka Connect
> ------------------------------------------------------
>
>                 Key: HUDI-2325
>                 URL: https://issues.apache.org/jira/browse/HUDI-2325
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Rajesh Mahindra
>            Assignee: Ethan Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to