Wenning Ding created HUDI-1194:
----------------------------------
Summary: Reorganize HoodieHiveClient and make it fully support
Hive Metastore API
Key: HUDI-1194
URL: https://issues.apache.org/jira/browse/HUDI-1194
Project: Apache Hudi
Issue Type: Improvement
Reporter: Wenning Ding
Currently there are three ways in HoodieHiveClient to perform Hive
functionalities. One is through Hive JDBC, one is through Hive Metastore API.
One is through Hive Driver.
There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to
control whether use Hive JDBC or not. However, this parameter does not
accurately describe the situation.
Basically, current logic is when set +*use_jdbc*+ to true, most of the methods
in HoodieHiveClient will use JDBC, and few methods in HoodieHiveClient will use
Hive Metastore API.
When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will
use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore
API.
Here is a table shows that what will actually be used when setting use_jdbc to
ture/false.
|Method|use_jdbc=true|use_jdbc=false|
|{{addPartitionsToTable}}|JDBC|Hive Driver|
|{{updatePartitionsToTable}}|JDBC|Hive Driver|
|{{scanTablePartitions}}|Metastore API|Metastore API|
|{{updateTableDefinition}}|JDBC|Hive Driver|
|{{createTable}}|JDBC|Hive Driver|
|{{getTableSchema}}|JDBC|Metastore API|
|{{doesTableExist}}|Metastore API|Metastore API|
|getLastCommitTimeSynced|Metastore API|Metastore API|
[~bschell] and I developed several Metastore API implementation for
{{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{,
}}{{updateTableDefinition }}{{which will be helpful for several issues: e.g.
resolving null partition hive sync issue and supporting ALTER_TABLE cascade
with AWS glue catalog}}{{. }}
{{But it seems hard to organize three implementations within the current
config. So we plan to separate HoodieHiveClient into three classes:}}
# {{HoodieHiveClient which implements all the APIs through Metastore API.}}
# {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite
several the APIs through Hive JDBC.}}
# {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite
several the APIs through Hive Driver.}}
{{And we introduce a new parameter
}}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let you
choose which Hive Client class to use.
{{}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)