Raymond Xu created HUDI-7383:
--------------------------------
Summary: CDC query failed due to dependency issue
Key: HUDI-7383
URL: https://issues.apache.org/jira/browse/HUDI-7383
Project: Apache Hudi
Issue Type: Bug
Components: incremental-query
Affects Versions: 0.14.1, 0.14.0
Reporter: Raymond Xu
{code:java}
spark-sql (default)> select count(*) from hudi_table_changes('tbl', 'cdc',
'20240205084624923', '20240205091637412');
24/02/05 09:47:46 WARN TaskSetManager: Lost task 10.0 in stage 28.0 (TID 1515)
(ip-10-0-117-21.us-west-2.compute.internal executor 3):
java.lang.NoClassDefFoundError:
org/apache/hudi/com/fasterxml/jackson/module/scala/DefaultScalaModule$
at
org.apache.hudi.cdc.HoodieCDCRDD$CDCFileGroupIterator.<init>(HoodieCDCRDD.scala:237)
at org.apache.hudi.cdc.HoodieCDCRDD.compute(HoodieCDCRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException:
org.apache.hudi.com.fasterxml.jackson.module.scala.DefaultScalaModule$
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 21 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)