codejoyan opened a new issue #2852: URL: https://github.com/apache/hudi/issues/2852
I have a requirement to read Hudi table from Hive. Documentation (https://hudi.apache.org/docs/querying_data.html#hive) says that we have to copy hudi-hadoop-mr-bundle-x.y.z-SNAPSHOT.jar in the aux jar path of the HiveServer2 host. I wanted to know what is the role of this jar and what happens internally when I start Hive with this jar. While hive sync without copying the jar to the aux path, I get the below error. In the error I can see it tries to create an external table. What if I manually create the external table. Will I be able to read a Hudi table from Hive after creating the external table manually or I will be missing out on some additional features? Please let me know if you have any questions. ``` scala> transformedDF.write.format("org.apache.hudi"). | options(getQuickstartWriteConfigs). | option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col_9"). | option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col_2,col_1,col_3"). | option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). | option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.keygen.ComplexKeyGenerator"). | option("hoodie.upsert.shuffle.parallelism","2"). | option("hoodie.insert.shuffle.parallelism","2"). | option(HoodieWriteConfig.TABLE_NAME, "TestTableHudiHive"). | option("hoodie.datasource.hive_sync.enable", true). | option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://hive_server2_host:10001/default;principal=hive/_HOST@[email protected];transportMode=http;httpPath=cliservice"). | option("hoodie.datasource.hive_sync.database", "default"). | option("hoodie.datasource.hive_sync.table", "TestTableHudiHive"). | option("hoodie.datasource.hive_sync.assume_date_partitioning", false). | option("hoodie.datasource.hive_sync.partition_fields", "partitionpath"). | mode(SaveMode.Append). | save(targetPath) 21/04/15 18:15:21 ERROR HiveSyncTool: Got runtime exception when hive syncing org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL **CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`TestTableHudiHive`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `col_1` string, `col_2` int, `col_3` int, `col_4` string, `col_5` string, `col_6` int, `col_7` bigint, `col_8` string, `col_9` bigint, `col_10` string, `cntry_cd` string, `bus_dt` DATE) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'gs://xxxxxxxxxxxxxxxxx1919010xxxxxxx/test_table_tgt_04142021_1'** at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:369) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:54) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:59) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:61) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:63) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:65) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:67) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:69) at $line23.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:71) at $line23.$read$$iw$$iw$$iw$$iw.<init>(<console>:73) at $line23.$read$$iw$$iw$$iw.<init>(<console>:75) at $line23.$read$$iw$$iw.<init>(<console>:77) at $line23.$read$$iw.<init>(<console>:79) at $line23.$read.<init>(<console>:81) at $line23.$read$.<init>(<console>:85) at $line23.$read$.<clinit>(<console>) at $line23.$eval$.$print$lzycompute(<console>:7) at $line23.$eval$.$print(<console>:6) at $line23.$eval.$print(<console>) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
