[ 
https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1194:
---------------------------------
        Parent: HUDI-2519
    Issue Type: Sub-task  (was: Improvement)

> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1194
>                 URL: https://issues.apache.org/jira/browse/HUDI-1194
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive 
> functionalities. One is through Hive JDBC, one is through Hive Metastore API. 
> One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to 
> control whether use Hive JDBC or not. However, this parameter does not 
> accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the 
> methods in HoodieHiveClient will use JDBC, and few methods in 
> HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will 
> use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore 
> API.
> Here is a table shows that what will actually be used when setting use_jdbc 
> to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for 
> {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, 
> }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. 
> resolving null partition hive sync issue and supporting ALTER_TABLE cascade 
> with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current 
> config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive Driver.}}
> {{And we introduce a new parameter 
> }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let 
> you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to