praveenkmr opened a new issue, #6623: URL: https://github.com/apache/hudi/issues/6623
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** During the upgradation of Hudi Pipelines from v0.6.0 to v0.9.0. I am facing ClassNotFoundException for those Pipelines where HBase is being used as an index. Hudi v0.6.0 was running fine on EMR v5.31.0 but the pipeline with the same configuration is failing in Hudi v0.9.0 in EMR 5.35.0 Cluster. HBase is also hosted in a separate EMR v5.31.0 cluster. While trying with the spark CLI, I am able to connect to HBase and able to write the Data but when trying with spark-submit it was failing. **To Reproduce** Steps to reproduce the behavior: 1. Create an HBase Cluster in EMR v5.31.0. 2. Trigger Spark Submit in EMR v5.35.0 in order to load the data in Hudi Table. The Job will fail with ClassNotFoundException **Expected behavior** The job should be able to connect to HBase and load data into Hudi table. **Environment Description** * Hudi version: 0.9.0 * Spark version: 2.4.8 * Hive version: 2.3.9 * Hadoop version: 2.10.1 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) : No **Additional context** The hudi configuration that is being used ``` "hudi_properties": { "hoodie.clean.async": "false", "hoodie.clean.automatic": "true", "hoodie.cleaner.commits.retained": "10", "hoodie.cleaner.parallelism": "500", "hoodie.consistency.check.enabled": "true", "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.database": "<hudi_database>", "hoodie.datasource.hive_sync.table": "<hudi_table_name>", "hoodie.datasource.hive_sync.partition_fields": "<partition_key>", "hoodie.datasource.hive_sync.assume_date_partitioning": "false", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", "hoodie.datasource.write.table.type": "COPY_ON_WRITE", "hoodie.datasource.write.operation": "upsert", "hoodie.datasource.write.partitionpath.field": "<partition_key>", "hoodie.datasource.write.precombine.field": "<precombine_key>", "hoodie.datasource.write.recordkey.field": "<primary_key>", "hoodie.datasource.write.streaming.ignore.failed.batch": "false", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.hbase.index.update.partition.path": "true", "hoodie.index.hbase.get.batch.size": "1000", "hoodie.index.hbase.max.qps.per.region.server": "1000", "hoodie.index.hbase.put.batch.size": "1000", "hoodie.index.hbase.qps.allocator.class": "org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator", "hoodie.index.hbase.qps.fraction": "0.5", "hoodie.index.hbase.rollback.sync": "true", "hoodie.index.hbase.table": "<hbase_table_name>", "hoodie.index.hbase.zknode.path": "/hbase", "hoodie.index.hbase.zkport": "2181", "hoodie.index.hbase.zkquorum": "<hudi_hbase_cluster_private_dns>", "hoodie.index.type": "HBASE", "hoodie.memory.compaction.fraction": "0.8", "hoodie.parquet.block.size": "152043520", "hoodie.parquet.compression.codec": "snappy", "hoodie.parquet.max.file.size": "152043520", "hoodie.parquet.small.file.limit": "104857600", "hoodie.table.name": "<hudi_table_name>", "hoodie.upsert.shuffle.parallelism": "20" } ``` **Stacktrace** ``` Caused by: org.apache.hudi.exception.HoodieDependentSystemUnavailableException: System HBASE unavailable. Tried to connect to ip-xxx-xx-xx-xx.ec2.internal:2181 at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:153) at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.lambda$locationTagFunction$eda54cbe$1(SparkHoodieHBaseIndex.java:217) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359) at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1181) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1155) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1090) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1155) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:881) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357) at org.apache.spark.rdd.RDD.iterator(RDD.scala:308) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:151) ... 31 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) ... 34 more Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2460) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656) ... 39 more Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2428) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2452) ... 40 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2332) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2426) ... 41 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
