melkimohamed opened a new issue #1376: Problem Sync Hudi table with Hive URL: https://github.com/apache/incubator-hudi/issues/1376 **probleme Hudi 0.5 with hive 2.1.0 ** **Describe the problem you faced** I use hudi 0.5 with spark 2.2 and hive 2.1.0, always the same problem it is not possible to sync hudi table with hive. i suspect that hudi 0.5 is not compatible with hive2.1.0, could you please confirm me ? **To Reproduce** in my cluster use the tow hudi jars: - hudi-spark-bundle-0.5.0-incubating.jar - hudi-hive-bundle-0.5.0-incubating.jar spark-shell --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false" --jars hdfs://dhadcluster02/libs/hudi-spark-bundle-0.5.0-incubating.jar,hdfs://dhadcluster02/libs/spark-avro_2.11-2.4.4.jar ``` import org.apache.spark.sql.SaveMode import org.apache.spark.sql.functions._ import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.config.HoodieWriteConfig import org.apache.hudi.hive.MultiPartKeysValueExtractor val inputDataPath = "hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet" val hudiTableName = "employee_parquet_hudi" val hudiTablePath = "hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet_hudi" val hudiOptions = Map[String,String]( DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id", DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", HoodieWriteConfig.TABLE_NAME -> hudiTableName, DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year", DataSourceWriteOptions.HIVE_URL_OPT_KEY -> "jdbc:hive2://host:10000/defaut;principal=hive/host@REALM", DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive", DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "testhudi", DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false", DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName ) val temp = spark.read.format("parquet").load(inputDataPath) temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath)``` **Expected behavior** synch hudi table with hive (create tables) **Environment Description** * Hudi version : 0.5 * Spark version : 2.2.0 * Hive version : 2.1.0 * Hadoop version : 2.7.3 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : NO **Additional context** I suspect that the problem that hudi 0.5 is not compatible with hive 2.1.0 because I see in pom.xml <hive.version> 2.3.1 </hive.version>, so I tried to build the hudi project with hive 2.1.0 : mvn clean package -DskipTests -DskipITs -Dhive.version =2.1.0 but I encounter another error **Stacktrace** ```20/03/05 15:29:15 WARN HoodieSparkSqlWriter$: hoodie dataset at hdfs://dhadcluster02/apps/hive/warehouse/testhudi.db/employee_parquet_hudi already exists. Deleting existing data & overwriting with new data. org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if table exists employee_parquet_hudi at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:459) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) ... 62 elided Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req' at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563) at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550) at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443) at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457) ... 83 more ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
