Gatsby-Lee commented on issue #5484: URL: https://github.com/apache/hudi/issues/5484#issuecomment-1137575389
@xushiyan hi. after I had a chat with you yesterday, I did some self-question / answer, and also some investigation in Glue and Hudi and Hive code base. Here is the my self-question and thought. ( FYI, I don’t have much detailed knowledge about hive and hudi code base ) And, my conclusion is that I think there is issue in Hudi, not really in AWS Glue. **Q1. What is the issue?** A. Metadata sync failed because incorrect MetaStoreClient is used. ( org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ) **Q2. What is the expected MetaStoreClient?** A. AWSCatalogMetastoreClient ( code ) **Q3. which Hudi version works and not works?** A. Hudi 0.10.1 works Hudi 0.11 doesn’t works ( Spark2, Spark3 ) **Q4. How is AWSCatalogMetastoreClient picked when using AWS Glue Catalog?** A. By overriding the config spark.hadoop.hive.metastore.client.factory.class ( spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory ) **Q5. Does the Glue set the spark.hadoop.hive.metastore.client.factory.class?** A. Yes it does regardless the Hudi version. **Q6. Does AWS Glue care if I am using Hudi 0.10.1 or 0.11?** A. it shouldn’t. It might not be there concern. **Q7. What datasource am I using?** A. Spark **Q8. When using Spark, which module create SyncClient?** A. It starts from HoodieSparkSqlWriter.scala (metaSync method ). also there is now module SyncUtilHelpers.java **Q9. Any change in MetaStoreClient in Hudi?** A. Yes. new feature is added. https://hudi.apache.org/docs/syncing_aws_glue_data_catalog -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
