[GitHub] [incubator-hudi] melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem

GitBox Mon, 23 Mar 2020 08:42:09 -0700

melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem
URL: https://github.com/apache/incubator-hudi/issues/1439
 
 
   **Describe the problem you faced**
   I tested hudi and everything works fine except the count requests
    The only problem when I do a count (select count (*) from table;), I always 
get the following error message even though the hudi library is loaded.
   ```
   Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable 
to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat
   ```
   
   **Note:** I am able to create hudi tables manually and the count query 
works,the problem only with automatically created tables (HIVE SYNC)
   do you have any idea on the problem of loading lib hudi on hive ?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1.  Build project (Everything works well)
   I am using HDP 2.6.4 (hive 2.1.0) with HUDI 0.5, I build the project with 
the steps below
   ```
   git clone [email protected]:apache/incubator-hudi.git
   
    rm 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
   
   mvn clean package -DskipTests -DskipITs -Dhive.version=2.1.0
   ```
    In hive.site.xml I added the configuration below
   ```
   hive.reloadable.aux.jars.path=/usr/hudi/hudi-hive-bundle-0.5.0-incubating.jar
   ```
   
   2. Creation of a dataset and synchronized it with hive(Everything works well)
   ```
    export SPARK_MAJOR_VERSION=2
    spark-shell --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.hive.convertMetastoreParquet=false" --jars 
hdfs://mycluster/libs/hudi-spark-bundle-0.5.0-incubating.jar
   import org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.functions._ 
   import org.apache.hudi.DataSourceWriteOptions 
   import org.apache.hudi.config.HoodieWriteConfig 
   import org.apache.hudi.hive.MultiPartKeysValueExtractor
   val inputDataPath = 
"hdfs://mycluster/apps/warehouse/test_acid.db/users_parquet"
   val hudiTableName = "users_cor"
   val hudiTablePath = "hdfs://mycluster/apps/warehouse/" + hudiTableName
   
   val hudiOptions = Map[String,String](
    DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
    DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", 
    HoodieWriteConfig.TABLE_NAME -> hudiTableName, 
    DataSourceWriteOptions.OPERATION_OPT_KEY ->
    DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, 
    DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE",
    DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year",
    DataSourceWriteOptions.HIVE_URL_OPT_KEY -> 
"jdbc:hive2://........:10000/;principal=...",
    DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive",
    DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "default",
    DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", 
    DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, 
    DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", 
    DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false",
    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> 
classOf[MultiPartKeysValueExtractor].getName
    )
   
temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath)
   
   val inputDF = spark.read.format("parquet").load(inputDataPath)
   inputDF.write .format("org.apache.hudi").
       options(hudiOptions).
       mode(SaveMode.Overwrite).
       save(hudiTablePath);
   ```
   **==> all work fine**
   
   3. update data (Everything works well)
   ```
   designation="Account Coordinator";
   val requestToUpdate = "Account Executive"
   val sqlStatement = s"SELECT count (*) FROM tdefault.users_cor WHERE 
designation = '$requestToUpdate'"
   spark.sql(sqlStatement).show()
   val updateDF = inputDF.filter(col("designation") === 
requestToUpdate).withColumn("designation", lit("Account Executive"))
   
   updateDF.write.format("org.apache.hudi").
   options(hudiOptions).option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL).
   mode(SaveMode.Append).
   save(hudiTablePath);
   ```
   
   4. DESCRIBE TABLE (Everything works well)
   ```
   DESCRIBE FORMATTED  users_cor;
   ```
   .
   .
   .
   | SerDe Library:                | 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   | InputFormat:                  | 
org.apache.hudi.hadoop.HoodieParquetInputFormat                 
   | OutputFormat:                 | 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat 
   
   5.  Count rows (Problem)
   
   ```select count (*) from users_mor;
   
   
20-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.hudi.hadoop.HoodieParquetInputFormat
   Serialization trace:
   inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
   aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
           at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484)
           at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323)
           at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
           ... 30 more
   Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable 
to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat
   ```
   
   
   **Environment Description** 
   
   * Hudi version : 0.5
   
   * Spark version : 2.2.0
   
   * Hive version : 2.1.0
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : NO
   
   
   **Additional context**
   
   I checked if the hudi library is loaded, by creating hudi tables manually 
and synchronizing hudi tables with Hive
   
   **Stacktrace**
   
   ```select count (*) from users_mor;
   INFO  : Tez session hasn't been created yet. Opening session
   INFO  : Dag name: select count (*) from users_mor(Stage-1)
   ERROR : Status: Failed
   ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1579091723876_115812_1_00, diagnostics=[Vertex 
vertex_1579091723876_115812_1_00 [Map 1] killed/failed due to:INIT_FAILURE, 
Fail to create InputInitializerManager, 
org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 
1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
           at 
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
           at 
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
           at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
           at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
           at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
           at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4620)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl.access$4400(VertexImpl.java:202)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3436)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3385)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3366)
           at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
           at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
           at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
           at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
           at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1938)
           at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
           at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2081)
           at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2067)
           at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
           at 
org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
           at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.reflect.InvocationTargetException
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
           at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
           at 
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
           ... 25 more
   Caused by: java.lang.RuntimeException: Failed to load plan: 
hdfs://ihadcluster02/tmp/hive/X183677/420271d6-4a80-4894-92d6-fb6ff73b3983/hive_2020-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.hudi.hadoop.HoodieParquetInputFormat
   Serialization trace:
   inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
   aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
           at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484)
           at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323)
           at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
           ... 30 more
   ``


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem

Reply via email to