Hi,
I tried using Datasource Write to sync to Hive, following is my code:
dfInput
.write.format(“org.apache.hudi”)
.mode(“append”)
.option("path", outputBase)
.option(HoodieWriteConfig.TABLE_NAME, “hudi_upsert_mor”)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,
"item_id") // This is the record key
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY,
"timestamp") // use to combine duplicate records in input/with disk val
.option(DataSourceWriteOptions.OPERATION_OPT_KEY,
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) // upsert
.option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL) // Hoodie Table Type
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
“hour”)
.option(DataSourceWriteOptions.TABLE_NAME_OPT_KEY,
"test_item_upsert") // Used by hive sync and queries
.option(DataSourceWriteOptions.INSERT_DROP_DUPS_OPT_KEY,
"true") // hoodie.datasource.write.insert.drop.duplicates
.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY,
“true”)
.option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY,
“hudi_poc”)
.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY,
“test_item_upsert”)
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY,
"jdbc:hive2://zk-1.vip.hadoop.com:2181”)
.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY,
“hour”)
.save()
What are the difference among HoodieWriteConfig.TABLE_NAME,
DataSourceWriteOptions.TABLE_NAME_OPT_KEY,
DataSourceWriteOptions.HIVE_TABLE_OPT_KEY?
Also, I got following error, how can I solve this?
BTW, this Hive is authenticated by Kerberos.
19/10/22 15:41:47 ERROR HiveConnection: Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol' is
unset! Struct:TOpenSessionReq(client_protocol:null,
configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
use:database=default})
19/10/22 15:41:47 ERROR Utils: Unable to read HiveServer2 configs from ZooKeeper
org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive connection
jdbc:hive2://zk-1.vip.hadoop.com:2181
Caused by: java.sql.SQLException: Could not open client transport for any of
the Server URI's in ZooKeeper: Unable to read HiveServer2 uri from ZooKeeper
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:221)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:176)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at
org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:570)
... 77 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read
HiveServer2 uri from ZooKeeper
at
org.apache.hive.jdbc.ZooKeeperHiveClientHelper.getNextServerUriFromZooKeeper(ZooKeeperHiveClientHelper.java:86)
at org.apache.hive.jdbc.Utils.updateConnParamsFromZooKeeper(Utils.java:532)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:217)
... 82 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Tried all
existing HiveServer2 uris from ZooKeeper.
Best,
Qian