rbtrtr opened a new issue, #6398:
URL: https://github.com/apache/hudi/issues/6398

   **Description**
   
   We're running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and 
take advantage of the metadata table feature. We tried to run a simple hudi 
write with generated data an got the attached stacktrace.
   
   We have used this hudi package: 
org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.
   
   The exception indicates that maybe something is not compatibe with the hbase 
version which hudi is compiled against. Unfortunately Cloudera provides hbase 
in verison 2.2.3. We're not sure if this is actually the root cause but why 
does hudi need this lib if nothing is stored in hbase?
   
   If we set _hoodie.metadata.enable_ to _false_ it's working, but we want to 
take advantage of this feature. 
   
   We tried 2 things to get rid of this exception.
   1) Set Index type to BLOOM ->  no effect
   2) Especially add the hbase server and client version to the spark shell in 
the version hudi is compiled against -> no effect
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.1.1
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7
   
   
   **Additional context**
   
   . Example write:
   ```scala
   df.write.format("hudi")
     .option(HIVE_CREATE_MANAGED_TABLE.key(), false)
     .option(HIVE_DATABASE.key(), "db_demo")
     .option(HIVE_SYNC_ENABLED.key(), true)
     .option(HIVE_SYNC_MODE.key(), "HMS")
     .option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
     .option("hoodie.table.name", "ht_hudi_11_1_metadata")
     .option(KEYGENERATOR_CLASS_NAME.key(), 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
     .option(OPERATION.key(), "upsert")
     .option(PRECOMBINE_FIELD.key(), "sequence")
     .option(RECORDKEY_FIELD.key(), "id")
     .option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
     .option("hoodie.index.type","BLOOM")
     .option("hoodie.metadata.enable", true)
     .mode("append")
     .save("hdfs:///.../hudi_11_1_metadata")
   ```
   
   **Stacktrace**
   
   ```java
   Caused by: java.lang.ExceptionInInitializerError
           at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
           at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
           at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
           at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
           at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
           at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
           at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
           at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
           ... 28 more
   Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be 
for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
           at 
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
           at 
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
           at 
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
           at 
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
           at 
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
           at 
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
           ... 36 more
   ........
   22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 
4 times; aborting job
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 
9) (hdl-w05.charite.de executor 1): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
           at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
           at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
           at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
           at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
           at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
           at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
           at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
           at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
           at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:131)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
           at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
           at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
           at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
           at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
           at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
           at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
           at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
           at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to