[ 
https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393727#comment-17393727
 ] 

ASF GitHub Bot commented on HUDI-1194:
--------------------------------------

zhedoubushishi closed pull request #1975:
URL: https://github.com/apache/hudi/pull/1975


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1194
>                 URL: https://issues.apache.org/jira/browse/HUDI-1194
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive 
> functionalities. One is through Hive JDBC, one is through Hive Metastore API. 
> One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to 
> control whether use Hive JDBC or not. However, this parameter does not 
> accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the 
> methods in HoodieHiveClient will use JDBC, and few methods in 
> HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will 
> use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore 
> API.
> Here is a table shows that what will actually be used when setting use_jdbc 
> to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for 
> {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, 
> }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. 
> resolving null partition hive sync issue and supporting ALTER_TABLE cascade 
> with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current 
> config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive Driver.}}
> {{And we introduce a new parameter 
> }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let 
> you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to