xmubeta opened a new issue, #9134: URL: https://github.com/apache/hudi/issues/9134
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** The combination of AWS Glue 4.0 with Hudi 0.12.2 is working. But after upgrading Hudi to 0.12.3, the job failed with Syncing Hive metastore issue. I noticed there are some changes around Hive Sync from release notes. But I could not figure it out. **To Reproduce** Steps to reproduce the behavior: 1. Create a Glue 4.0 job with the following script: ``` import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from pyspark.sql import DataFrame, Row import datetime from awsglue import DynamicFrame from pyspark.sql.session import SparkSession from pyspark.sql.types import * from pyspark.sql.functions import from_json, col # Jar required # hudi-spark3.3-bundle_2.12-0.12.3.jar,spark-avro_2.12-3.3.0.jar,calcite-core-1.10.0.jar ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) spark = SparkSession.builder \ .config('spark.serializer','org.apache.spark.serializer.KryoSerializer') \ .config('spark.sql.extensions','org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \ .config('spark.sql.hive.convertMetastoreParquet','false')\ .config('spark.default.parallelism',20)\ .getOrCreate() sc = spark.sparkContext glueContext = GlueContext(sc) job = Job(glueContext) job.init(args['JOB_NAME'], args) topicSchema = StructType() \ .add("message_id", LongType()) \ .add("client_id", StringType()) \ .add("timestamp", TimestampType()) \ .add("humidity", IntegerType()) \ .add("temperature", IntegerType()) \ .add("pressure", IntegerType()) \ .add("pitch", StringType()) \ .add("roll", StringType()) \ .add("yaw", StringType()) \ .add("count", LongType()) raw_data=[ {"message_id": 924085105263850, "client_id": "raspberrypi4", "timestamp": "2022-09-24 08:51:05", "humidity": 112, "temperature": 4, "pressure": 462, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 36058}, {"message_id": 924085106380188, "client_id": "raspberrypi18", "timestamp": "2022-09-24 08:51:06", "humidity": 32, "temperature": 18, "pressure": 362, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 36059}, {"message_id": 924085107593821, "client_id": "raspberrypi6", "timestamp": "2022-09-24 08:51:07", "humidity": 91, "temperature": 6, "pressure": 1138, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 36060}, {"message_id": 924085108811805, "client_id": "raspberrypi3", "timestamp": "2022-09-24 08:51:08", "humidity": 102, "temperature": 3, "pressure": 1355, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 36062}, ] df2 = spark.read.schema(topicSchema).json(sc.parallelize(raw_data)) targetDBName = 'kafka_test' targetTableName = 'huditable_iot_fake_mor7' hudiStorageType = 'MERGE_ON_READ' targetPath = 's3://xxx/glue/streaming/' + targetTableName + '/' primaryKey = 'message_id' partitionKey = 'client_id' timestamp = 'timestamp' hudiConfigWithSync = { 'className' : 'org.apache.hudi', 'hoodie.datasource.hive_sync.use_jdbc':'false', 'hoodie.datasource.write.partitionpath.field': partitionKey, 'hoodie.datasource.write.recordkey.field': primaryKey, 'hoodie.datasource.write.precombine.field': timestamp, 'hoodie.datasource.write.operation': 'upsert', 'hoodie.table.name': targetTableName, 'hoodie.consistency.check.enabled': 'true', 'hoodie.datasource.write.hive_style_partitioning':'true', 'hoodie.datasource.write.table.type': hudiStorageType, 'hoodie.datasource.hive_sync.database': targetDBName, 'hoodie.datasource.hive_sync.table': targetTableName, 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.partition_fields': partitionKey, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.support_timestamp':'true', 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.SimpleKeyGenerator', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2, 'hoodie.datasource.hive_sync.mode':'hms', 'hoodie.compact.inline': 'true', 'hoodie.compact.inline.max.delta.commits':2, } def processBatch(data_frame, batchId): if (data_frame.count() > 0): spark_df = data_frame spark_df.write.format('org.apache.hudi').options(**hudiConfigWithSync).mode('append').save(targetPath) processBatch(df2, 1) job.commit() ``` 3. It failed with the following error: ``` 2023-07-06 06:23:28,744 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last): File "/tmp/kafka-iot-hudi-fake.py", line 154, in <module> processBatch(df2, 1) File "/tmp/kafka-iot-hudi-fake.py", line 138, in processBatch spark_df.write.format('org.apache.hudi').options(**hudiConfigWithSync).mode('append').save(targetPath) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 968, in save self._jwrite.save(path) File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ return_value = get_return_value( File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o155.save. : org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:60) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:673) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:672) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:672) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:759) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:350) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91) at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:84) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58) ... 54 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89) ... 56 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Got runtime exception when hive syncing at org.apache.hudi.hive.HiveSyncTool.initSyncClient(HiveSyncTool.java:118) at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:107) ... 61 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to create HiveMetaStoreClient at org.apache.hudi.hive.HoodieHiveSyncClient.<init>(HoodieHiveSyncClient.java:103) at org.apache.hudi.hive.HiveSyncTool.initSyncClient(HiveSyncTool.java:113) ... 62 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:243) at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:413) at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:346) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:326) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:295) at org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:81) at org.apache.hudi.hive.HoodieHiveSyncClient.<init>(HoodieHiveSyncClient.java:87) ... 63 more Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50) at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3856) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3836) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4098) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:255) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:238) ... 69 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740) ... 79 more Caused by: MetaException(message:Error getting metastore password: null) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6950) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:162) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) ... 84 more Caused by: java.lang.RuntimeException: Error getting metastore password: null at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:492) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139) at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:627) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:587) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:654) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:430) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79) ... 88 more Caused by: java.io.IOException at org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:968) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:487) ... 105 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:962) ... 106 more Caused by: java.io.IOException: Configuration problem with provider path. at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2455) at org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2374) ... 111 more Caused by: java.io.IOException: No CredentialProviderFactory for testingforemptydefaultvalue in hadoop.security.credential.provider.path at org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:103) at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2436) ... 112 more ``` 5. 6. **Expected behavior** The same script works with hudi 0.12.2. **Environment Description** * Hudi version : 0.12.3 * Spark version : 3.3.0 * Hive version : n/a * Hadoop version :n/a * Storage (HDFS/S3/GCS..) :S3 * Running on Docker? (yes/no) :no **Additional context** The error seems like not being able to get credential. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
