[GitHub] [incubator-hudi] melkimohamed opened a new issue #1376: Problem Sync Hudi table with Hive

GitBox Thu, 05 Mar 2020 07:42:09 -0800

melkimohamed opened a new issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376
 
 
   **probleme Hudi 0.5 with hive 2.1.0 **
   
   
   **Describe the problem you faced**
   
    I use hudi 0.5 with spark 2.2 and hive 2.1.0, always the same problem it is 
not possible to sync hudi table with hive.
   i suspect that hudi 0.5 is not compatible with hive2.1.0, could you please 
confirm me ?
   
   
   **To Reproduce**
   
   in my cluster use the tow hudi jars:
   - hudi-spark-bundle-0.5.0-incubating.jar 
   - hudi-hive-bundle-0.5.0-incubating.jar
   spark-shell --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.hive.convertMetastoreParquet=false" --jars 
hdfs://dhadcluster02/libs/hudi-spark-bundle-0.5.0-incubating.jar,hdfs://dhadcluster02/libs/spark-avro_2.11-2.4.4.jar
   
   ```
   import org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.functions._ 
   import org.apache.hudi.DataSourceWriteOptions 
   import org.apache.hudi.config.HoodieWriteConfig 
   import org.apache.hudi.hive.MultiPartKeysValueExtractor
   
   val inputDataPath = 
"hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet"
   val hudiTableName = "employee_parquet_hudi"
   val hudiTablePath = 
"hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet_hudi"
   
   val hudiOptions = Map[String,String](
    DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
    DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", 
    HoodieWriteConfig.TABLE_NAME -> hudiTableName, 
    DataSourceWriteOptions.OPERATION_OPT_KEY ->
    DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, 
    DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE",
    DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year",
    DataSourceWriteOptions.HIVE_URL_OPT_KEY -> 
"jdbc:hive2://host:10000/defaut;principal=hive/host@REALM",
    DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive",
    DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "testhudi",
    DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", 
    DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, 
    DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", 
    DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false",
    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> 
classOf[MultiPartKeysValueExtractor].getName
    )
   val temp = spark.read.format("parquet").load(inputDataPath)
   
temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath)```
   
   
   **Expected behavior**
   
   synch hudi table with hive (create tables)
   
   **Environment Description**
   
   * Hudi version : 0.5
   
   * Spark version : 2.2.0
   
   * Hive version : 2.1.0
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : NO
   
   
   **Additional context**
   
   I suspect that the problem that hudi 0.5 is not compatible with hive 2.1.0 
because I see in pom.xml <hive.version> 2.3.1 </hive.version>, so I tried to 
build the hudi project with hive 2.1.0  : 
   mvn clean package -DskipTests -DskipITs  -Dhive.version =2.1.0 but I 
encounter another error
   
   
   **Stacktrace**
   
   ```20/03/05 15:29:15 WARN HoodieSparkSqlWriter$: hoodie dataset at 
hdfs://dhadcluster02/apps/hive/warehouse/testhudi.db/employee_parquet_hudi 
already exists. Deleting existing data & overwriting with new data.
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if table 
exists employee_parquet_hudi
     at 
org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:459)
     at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
     at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
     at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
     at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
     ... 62 elided
   Caused by: org.apache.thrift.TApplicationException: Invalid method name: 
'get_table_req'
     at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
     at 
org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
     at 
org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
     at 
org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
     at 
org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
     ... 83 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] melkimohamed opened a new issue #1376: Problem Sync Hudi table with Hive

Reply via email to