GnsCy opened a new issue, #5729: URL: https://github.com/apache/hudi/issues/5729
Running the demo setup as described [here](https://hudi.apache.org/docs/docker_demo) for v0.11 results in jar files missing error when running `spark-submit` and `hive-sync` commands. Steps to reproduce the behavior: 1. Clone repo and switch to 0.11 release tag 2. Setup the docker environments 3. Publish events to kafka 4. Try to run the spark-submit job to ingest data **Expected behavior** The demo environment is setup correctly and be able to go through all the scenarios of the demo. **Environment Description** * Hudi version :0.11 * Spark version : 2.4.4 * Hive version : 2.3.3 * Hadoop version : 2.8.4 * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : yes **Additional context** Add any other context about the problem here. **Stacktrace** ``` spark-submit \ > --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \ > --table-type COPY_ON_WRITE \ > --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ > --source-ordering-field ts \ > --target-base-path /user/hive/warehouse/stock_ticks_cow \ > --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \ > --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 22/05/31 06:54:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/05/31 06:54:24 WARN DependencyUtils: Local jar /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar does not exist, skipping. 22/05/31 06:54:24 WARN SparkSubmit$$anon$2: Failed to load org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer. java.lang.ClassNotFoundException: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:238) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)` ``` ``` hive-sync -> Exception in thread "main" org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing stock_ticks_cow at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141) at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:433) Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` double, `key` string, `date` string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` String) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/hive/warehouse/stock_ticks_cow' TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.s ources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable" :false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}') at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:67) at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.createTable(QueryBasedDDLExecutor.java:84) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:168) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:276) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:217) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:150) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:138) ... 1 more Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat' at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253) at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253) at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:65) ... 7 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
