Gatsby-Lee commented on issue #5484:
URL: https://github.com/apache/hudi/issues/5484#issuecomment-1137575389

   @xushiyan hi.
   after I had a chat with you yesterday, I did some self-question / answer, 
and also some investigation in Glue and Hudi and Hive code base.
   
   Here is the my self-question and thought. ( FYI, I don’t have much detailed 
knowledge about hive and hudi code base )
   
   And, my conclusion is that I think there is issue in Hudi, not really in AWS 
Glue.
   
   **Q1. What is the issue?**
   A. Metadata sync failed because incorrect MetaStoreClient is used.
   ( org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient )
   
   **Q2. What is the expected MetaStoreClient?**
   A. AWSCatalogMetastoreClient ( code )
   
   **Q3. which Hudi version works and not works?**
   A.
   Hudi 0.10.1 works
   Hudi 0.11 doesn’t works ( Spark2, Spark3 )
   
   **Q4. How is AWSCatalogMetastoreClient picked when using AWS Glue Catalog?**
   A. By overriding the config spark.hadoop.hive.metastore.client.factory.class
   ( 
spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
 )
   
   **Q5. Does the Glue set the 
spark.hadoop.hive.metastore.client.factory.class?**
   A. Yes it does regardless the Hudi version.
   
   **Q6. Does AWS Glue care if I am using Hudi 0.10.1 or 0.11?**
   A. it shouldn’t. It might not be there concern.
   
   **Q7. What datasource am I using?**
   A. Spark
   
   **Q8. When using Spark, which module create SyncClient?**
   A. It starts from HoodieSparkSqlWriter.scala (metaSync method ). also there 
is now module SyncUtilHelpers.java
   
   **Q9. Any change in MetaStoreClient in Hudi?**
   A. Yes. new feature is added.
   https://hudi.apache.org/docs/syncing_aws_glue_data_catalog
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to