melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem URL: https://github.com/apache/incubator-hudi/issues/1439 **Describe the problem you faced** I tested hudi and everything works fine except the count requests The only problem when I do a count (select count (*) from table;), I always get the following error message even though the hudi library is loaded. ``` Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat ``` **Note:** I am able to create hudi tables manually and the count query works,the problem only with automatically created tables (HIVE SYNC) do you have any idea on the problem of loading lib hudi on hive ? **To Reproduce** Steps to reproduce the behavior: 1. Build project (Everything works well) I am using HDP 2.6.4 (hive 2.1.0) with HUDI 0.5, I build the project with the steps below ``` git clone [email protected]:apache/incubator-hudi.git rm hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java mvn clean package -DskipTests -DskipITs -Dhive.version=2.1.0 ``` In hive.site.xml I added the configuration below ``` hive.reloadable.aux.jars.path=/usr/hudi/hudi-hive-bundle-0.5.0-incubating.jar ``` 2. Creation of a dataset and synchronized it with hive(Everything works well) ``` export SPARK_MAJOR_VERSION=2 spark-shell --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false" --jars hdfs://mycluster/libs/hudi-spark-bundle-0.5.0-incubating.jar import org.apache.spark.sql.SaveMode import org.apache.spark.sql.functions._ import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.config.HoodieWriteConfig import org.apache.hudi.hive.MultiPartKeysValueExtractor val inputDataPath = "hdfs://mycluster/apps/warehouse/test_acid.db/users_parquet" val hudiTableName = "users_cor" val hudiTablePath = "hdfs://mycluster/apps/warehouse/" + hudiTableName val hudiOptions = Map[String,String]( DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id", DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", HoodieWriteConfig.TABLE_NAME -> hudiTableName, DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year", DataSourceWriteOptions.HIVE_URL_OPT_KEY -> "jdbc:hive2://........:10000/;principal=...", DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive", DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "default", DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false", DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName ) temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath) val inputDF = spark.read.format("parquet").load(inputDataPath) inputDF.write .format("org.apache.hudi"). options(hudiOptions). mode(SaveMode.Overwrite). save(hudiTablePath); ``` **==> all work fine** 3. update data (Everything works well) ``` designation="Account Coordinator"; val requestToUpdate = "Account Executive" val sqlStatement = s"SELECT count (*) FROM tdefault.users_cor WHERE designation = '$requestToUpdate'" spark.sql(sqlStatement).show() val updateDF = inputDF.filter(col("designation") === requestToUpdate).withColumn("designation", lit("Account Executive")) updateDF.write.format("org.apache.hudi"). options(hudiOptions).option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL). mode(SaveMode.Append). save(hudiTablePath); ``` 4. DESCRIBE TABLE (Everything works well) ``` DESCRIBE FORMATTED users_cor; ``` . . . | SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat 5. Count rows (Problem) ```select count (*) from users_mor; 20-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101) ... 30 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat ``` **Environment Description** * Hudi version : 0.5 * Spark version : 2.2.0 * Hive version : 2.1.0 * Hadoop version : 2.7.3 * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : NO **Additional context** I checked if the hudi library is loaded, by creating hudi tables manually and synchronizing hudi tables with Hive **Stacktrace** ```select count (*) from users_mor; INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select count (*) from users_mor(Stage-1) ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1579091723876_115812_1_00, diagnostics=[Vertex vertex_1579091723876_115812_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71) at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89) at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152) at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148) at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121) at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4620) at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4400(VertexImpl.java:202) at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3436) at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3385) at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3366) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1938) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2081) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2067) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68) ... 25 more Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ihadcluster02/tmp/hive/X183677/420271d6-4a80-4894-92d6-fb6ff73b3983/hive_2020-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101) ... 30 more ``
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
