imperio-wxm opened a new issue #827: java.lang.ClassNotFoundException: com.uber.hoodie.hadoop.HoodieInputFormat URL: https://github.com/apache/incubator-hudi/issues/827 spark 2.4.0.cloudera1 hadoop 2.6.0-cdh5.11.1 hive 1.1.0-cdh5.11.1 hudi 0.4.7 ## Build Success ```java mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.11.1 -Dhive.version=1.1.0-cdh5.11.1 ``` I write spark app want to write data to a table then sync to hive. ## My code: ```java dataset.write() .format("com.uber.hoodie") .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(),"jdbc:hive2://xxxx.xxxx:10000") .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), "dw") .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), true) .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), "wxm_hoodie_test") .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), "part") .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "key") .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "part") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp") .option(HoodieWriteConfig.TABLE_NAME, "wxm_hoodie_test") .mode(SaveMode.Append) .save("/wxm/hudi/data"); ``` ## table schema ```java 19/08/08 17:44:56 INFO HoodieSparkSQLWriter: Registered avro schema : { "type" : "record", "name" : "wxm_hoodie_test_record", "namespace" : "hoodie.wxm_hoodie_test", "fields" : [ { "name" : "age", "type" : [ "string", "null" ] }, { "name" : "key", "type" : [ "string", "null" ] }, { "name" : "name", "type" : [ "string", "null" ] }, { "name" : "part", "type" : [ "string", "null" ] }, { "name" : "timestamp", "type" : [ "string", "null" ] } ] } ``` ## submit command ```java spark2-submit --jars basePath/hoodie-hadoop-mr-0.4.7.jar,basePath/hoodie-spark-0.4.7.jar,basePath/hoodie-hive-0.4.7.jar,basePath/hoodie-common-0.4.7.jar,basePath/hoodie-hive-bundle-0.4.7.jar,basePath/hoodie-hadoop-mr-bundle-0.4.7.jar \ --class com.wxmimperio.spark.hudi.HudiWrite --master yarn --deploy-mode cluster \ --driver-memory 4g --executor-memory 2g --executor-cores 1 xxxx.jar people.json ``` I include `hoodie-hadoop-mr-0.4.7.jar`,`hoodie-spark-0.4.7.jar`,`hoodie-hive-0.4.7.jar`,`hoodie-common-0.4.7.jar`,`hoodie-hive-bundle-0.4.7.jar`,`hoodie-hadoop-mr-bundle-0.4.7.jar` but also get error. Then I find `com.uber.hoodie.hadoop.HoodieInputFormat` in `hoodie-hadoop-mr-0.4.7.jar`.  > ### **I read log and write data success, just get error when sync to hive, should I put hoodie-hadoop-mr-0.4.7.jar in $HIVE_HOME/lib ?** ## get error ```java com.uber.hoodie.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE IF NOT EXISTS dw.wxm_hoodie_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `age` string, `key` string, `name` string, `timestamp` string) PARTITIONED BY (part string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/wxm/hudi/data' at com.uber.hoodie.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:459) at com.uber.hoodie.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262) at com.uber.hoodie.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:129) at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96) at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68) at com.uber.hoodie.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:238) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at com.wxmimperio.spark.hudi.HudiWrite.main(HudiWrite.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:694) Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at com.uber.hoodie.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:457) ... 34 more Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:188) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:267) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:501) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:227) at org.apache.hadoop.hive.ql.parse.StorageFormat.fillStorageFormat(StorageFormat.java:57) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10904) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10142) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10223) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:223) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:494) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1278) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1265) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186) ... 26 more Caused by: java.lang.ClassNotFoundException: com.uber.hoodie.hadoop.HoodieInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:225) ... 36 more 19/08/08 17:46:25 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: com.uber.hoodie.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE IF NOT EXISTS dw.wxm_hoodie_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `age` string, `key` string, `name` string, `timestamp` string) PARTITIONED BY (part string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/wxm/hudi/data' at com.uber.hoodie.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:459) at com.uber.hoodie.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262) at com.uber.hoodie.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:129) at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96) at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68) at com.uber.hoodie.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:238) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at com.wxmimperio.spark.hudi.HudiWrite.main(HudiWrite.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:694) Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at com.uber.hoodie.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:457) ... 34 more Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:188) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:267) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:501) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot find class 'com.uber.hoodie.hadoop.HoodieInputFormat' at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:227) at org.apache.hadoop.hive.ql.parse.StorageFormat.fillStorageFormat(StorageFormat.java:57) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10904) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10142) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10223) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:223) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:494) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1278) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1265) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186) ... 26 more Caused by: java.lang.ClassNotFoundException: com.uber.hoodie.hadoop.HoodieInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:225) ... 36 more ) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
