umehrot2 opened a new pull request #961: Support Glue catalog and other hive 
metastore implementations
URL: https://github.com/apache/incubator-hudi/pull/961
 
 
   Hudi currently does not work with `AWS Glue Catalog` or other Hive metastore 
implementations. The issue/exception it runs into has been reported here as 
well [issue](https://github.com/apache/incubator-hudi/issues/954) .
   
   As mentioned in the issue, the reason for this is:
   
   - Currently Hudi is interacting with Hive through two different ways:
   - Creation of table statement is submitted directly to Hive via JDBC 
https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L472
 . Thus, Hive will internally create the right metastore client (i.e. Glue if 
**hive.metastore.client.factory.class** is set to 
**com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory** in 
hive-site)
   - Whereas partition listing among other things are being done by directly 
calling hive metastore APIs using hive metastore client: 
https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L240
   - Now in Hudi code, standard specific implementation of the metastore client 
(not glue metastore client) is being instantiated: 
https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L109
 .
   - Ideally this instantiation of metastore client should be left to Hive 
through 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L5045
 for it to consider other implementations of metastore client that might be 
configured through **hive.metastore.client.factory.class** .
   
   That is the reason that table gets created in Glue metastore, but while 
reading or scanning partitions it is talking to the local hive metastore where 
it does not find the table created.
   
   **Note**: We need to removing shading of `Hive` in `hudi-spark-bundle` by 
default, because we would get **RuntimeException NoSuchMethod** because 
**HiveConf** is shaded and relocated to a new namespace. But `Hive.java` is not 
shaded and hence `Hive.get(conf)` results in `NoSuchMethod`. We cannot shade 
`Hive.java` since it is in `hive-exec` which itself is a huge bundle jar with 
numerous dependencies. A similar issue already exists in Hudi because of 
shading of Hive which we have reported here: 
https://issues.apache.org/jira/browse/HUDI-281 . So this PR will help fix that 
also.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to