praveenkmr opened a new issue, #6623:
URL: https://github.com/apache/hudi/issues/6623

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   During the upgradation of Hudi Pipelines from v0.6.0 to v0.9.0. I am facing 
ClassNotFoundException for those Pipelines where HBase is being used as an 
index.
   
   Hudi v0.6.0 was running fine on EMR v5.31.0 but the pipeline with the same 
configuration is failing in Hudi v0.9.0 in EMR 5.35.0 Cluster. HBase is also 
hosted in a separate EMR v5.31.0 cluster.
   
   While trying with the spark CLI, I am able to connect to HBase and able to 
write the Data but when trying with spark-submit it was failing.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create an HBase Cluster in EMR v5.31.0.
   2. Trigger Spark Submit in EMR v5.35.0 in order to load the data in Hudi 
Table. The Job will fail with ClassNotFoundException 
   
   **Expected behavior**
   
   The job should be able to connect to HBase and load data into Hudi table.
   
   **Environment Description**
   
   * Hudi version: 0.9.0
   
   * Spark version: 2.4.8
   
   * Hive version: 2.3.9
   
   * Hadoop version: 2.10.1
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   The hudi configuration that is being used 
   
   ```
    "hudi_properties": {
      "hoodie.clean.async": "false",
      "hoodie.clean.automatic": "true",
      "hoodie.cleaner.commits.retained": "10",
      "hoodie.cleaner.parallelism": "500",
      "hoodie.consistency.check.enabled": "true",
      "hoodie.datasource.hive_sync.enable": "true",
      "hoodie.datasource.hive_sync.database": "<hudi_database>",
      "hoodie.datasource.hive_sync.table": "<hudi_table_name>",
      "hoodie.datasource.hive_sync.partition_fields": "<partition_key>",
      "hoodie.datasource.hive_sync.assume_date_partitioning": "false",
      "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",   
      "hoodie.datasource.write.table.type": "COPY_ON_WRITE",
      "hoodie.datasource.write.operation": "upsert",
      "hoodie.datasource.write.partitionpath.field": "<partition_key>",
      "hoodie.datasource.write.precombine.field": "<precombine_key>",
      "hoodie.datasource.write.recordkey.field": "<primary_key>",
      "hoodie.datasource.write.streaming.ignore.failed.batch": "false",
      "hoodie.datasource.write.hive_style_partitioning": "true",
      "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
      "hoodie.hbase.index.update.partition.path": "true",
      "hoodie.index.hbase.get.batch.size": "1000",
      "hoodie.index.hbase.max.qps.per.region.server": "1000",
      "hoodie.index.hbase.put.batch.size": "1000",
      "hoodie.index.hbase.qps.allocator.class": 
"org.apache.hudi.index.hbase.DefaultHBaseQPSResourceAllocator",
      "hoodie.index.hbase.qps.fraction": "0.5",
      "hoodie.index.hbase.rollback.sync": "true",
      "hoodie.index.hbase.table": "<hbase_table_name>",
      "hoodie.index.hbase.zknode.path": "/hbase",
      "hoodie.index.hbase.zkport": "2181",
      "hoodie.index.hbase.zkquorum": "<hudi_hbase_cluster_private_dns>",
      "hoodie.index.type": "HBASE",
      "hoodie.memory.compaction.fraction": "0.8",
      "hoodie.parquet.block.size": "152043520",
      "hoodie.parquet.compression.codec": "snappy",
      "hoodie.parquet.max.file.size": "152043520",
      "hoodie.parquet.small.file.limit": "104857600",
      "hoodie.table.name": "<hudi_table_name>",
      "hoodie.upsert.shuffle.parallelism": "20"
    }
   ```
   
   **Stacktrace**
   
   ``` 
   Caused by: 
org.apache.hudi.exception.HoodieDependentSystemUnavailableException: System 
HBASE unavailable. Tried to connect to ip-xxx-xx-xx-xx.ec2.internal:2181
        at 
org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:153)
        at 
org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.lambda$locationTagFunction$eda54cbe$1(SparkHoodieHBaseIndex.java:217)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
        at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1181)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1155)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1090)
        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1155)
        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:881)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
    Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at 
org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at 
org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
        at 
org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
        at 
org.apache.hudi.index.hbase.SparkHoodieHBaseIndex.getHBaseConnection(SparkHoodieHBaseIndex.java:151)
        ... 31 more
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
        ... 34 more
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2460)
        at 
org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656)
        ... 39 more
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener 
not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2428)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2452)
        ... 40 more
    Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2332)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2426)
        ... 41 more 
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to