Hi,

I tried using Datasource Write to sync to Hive, following is my code:
dfInput
                .write.format(“org.apache.hudi”)
                .mode(“append”)
                .option("path", outputBase)
                .option(HoodieWriteConfig.TABLE_NAME, “hudi_upsert_mor”)
                .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, 
"item_id") // This is the record key
                .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"timestamp") // use to combine duplicate records in input/with disk val
                .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // upsert
                .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL) // Hoodie Table Type
                .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
“hour”)
                .option(DataSourceWriteOptions.TABLE_NAME_OPT_KEY, 
"test_item_upsert") // Used by hive sync and queries
                .option(DataSourceWriteOptions.INSERT_DROP_DUPS_OPT_KEY, 
"true") // hoodie.datasource.write.insert.drop.duplicates
                .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, 
“true”)
                .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, 
“hudi_poc”)
                .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, 
“test_item_upsert”)
                .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, 
"jdbc:hive2://zk-1.vip.hadoop.com:2181”)
                .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, 
“hour”)
                .save()

What are the difference among HoodieWriteConfig.TABLE_NAME, 
DataSourceWriteOptions.TABLE_NAME_OPT_KEY, 
DataSourceWriteOptions.HIVE_TABLE_OPT_KEY?
Also, I got following error, how can I solve this?
BTW, this Hive is authenticated by Kerberos.

19/10/22 15:41:47 ERROR HiveConnection: Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol' is 
unset! Struct:TOpenSessionReq(client_protocol:null, 
configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
 use:database=default})

19/10/22 15:41:47 ERROR Utils: Unable to read HiveServer2 configs from ZooKeeper
org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive connection 
jdbc:hive2://zk-1.vip.hadoop.com:2181

Caused by: java.sql.SQLException: Could not open client transport for any of 
the Server URI's in ZooKeeper: Unable to read HiveServer2 uri from ZooKeeper
  at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:221)
  at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:176)
  at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
  at java.sql.DriverManager.getConnection(DriverManager.java:664)
  at java.sql.DriverManager.getConnection(DriverManager.java:247)
  at 
org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:570)
  ... 77 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
HiveServer2 uri from ZooKeeper
  at 
org.apache.hive.jdbc.ZooKeeperHiveClientHelper.getNextServerUriFromZooKeeper(ZooKeeperHiveClientHelper.java:86)
  at org.apache.hive.jdbc.Utils.updateConnParamsFromZooKeeper(Utils.java:532)
  at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:217)
  ... 82 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Tried all 
existing HiveServer2 uris from ZooKeeper.


Best,
Qian

Reply via email to