[
https://issues.apache.org/jira/browse/HUDI-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440804#comment-17440804
]
Ethan Guo commented on HUDI-2325:
---------------------------------
Here's the private branch with the whitelisting approach:
[https://github.com/yihua/hudi/tree/HUDI-2325-kafka-connect-hive-sync]
I followed the confluent guide to install the HDFS 2 sink connector:
[https://docs.confluent.io/kafka-connect-hdfs/current/overview.html#prerequisites.]
(1) run `confluent-hub install confluentinc/kafka-connect-hdfs:latest`
(2) add the hdfs connector libs to `plugin.path` in
`connect-distributed.properties`:
`plugin.path=/usr/local/share/java,/Users/joyce/repo/confluent-6.2.1/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/`
When running the Hudi kafka-connect sink, hit the following exception:
{code:java}
[2021-11-08 14:34:07,550] ERROR [hudi-sink|task-1]
WorkerSinkTask{id=hudi-sink-1} Task threw an uncaught and unrecoverable
exception. Task is being killed and will not recover until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:193)
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:308)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more
[2021-11-08 14:34:07,550] ERROR [hudi-sink|task-2]
WorkerSinkTask{id=hudi-sink-2} Task threw an uncaught and unrecoverable
exception. Task is being killed and will not recover until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:193)
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:308)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more {code}
> Implement and test Hive Sync support for Kafka Connect
> ------------------------------------------------------
>
> Key: HUDI-2325
> URL: https://issues.apache.org/jira/browse/HUDI-2325
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Rajesh Mahindra
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)