xmubeta opened a new issue, #9134:
URL: https://github.com/apache/hudi/issues/9134

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   The combination of AWS Glue 4.0 with Hudi 0.12.2 is working. But after 
upgrading Hudi to 0.12.3, the job failed with Syncing Hive metastore issue. I 
noticed there are some changes around Hive Sync from release notes. But I could 
not figure it out.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Create a Glue 4.0 job with the following script:
   ```
   
   import sys
   from awsglue.transforms import *
   from awsglue.utils import getResolvedOptions
   from pyspark.context import SparkContext
   from awsglue.context import GlueContext
   from awsglue.job import Job
   from pyspark.sql import DataFrame, Row
   import datetime
   from awsglue import DynamicFrame
   from pyspark.sql.session import SparkSession
   from pyspark.sql.types import *
   from pyspark.sql.functions import from_json, col
   
   # Jar required
   # 
hudi-spark3.3-bundle_2.12-0.12.3.jar,spark-avro_2.12-3.3.0.jar,calcite-core-1.10.0.jar
   
   
   
   ## @params: [JOB_NAME]
   args = getResolvedOptions(sys.argv, ['JOB_NAME'])
   spark = SparkSession.builder \
       .config('spark.serializer','org.apache.spark.serializer.KryoSerializer') 
\
       
.config('spark.sql.extensions','org.apache.spark.sql.hudi.HoodieSparkSessionExtension')
 \
       .config('spark.sql.hive.convertMetastoreParquet','false')\
       .config('spark.default.parallelism',20)\
       .getOrCreate()
   sc = spark.sparkContext
   glueContext = GlueContext(sc)
   job = Job(glueContext)
   job.init(args['JOB_NAME'], args)
   
   topicSchema = StructType() \
                   .add("message_id", LongType()) \
                   .add("client_id", StringType()) \
                   .add("timestamp", TimestampType()) \
                   .add("humidity", IntegerType()) \
                   .add("temperature", IntegerType()) \
                   .add("pressure", IntegerType()) \
                   .add("pitch", StringType()) \
                   .add("roll", StringType()) \
                   .add("yaw", StringType()) \
                   .add("count", LongType()) 
                   
   raw_data=[
       {"message_id": 924085105263850, "client_id": "raspberrypi4", 
"timestamp": "2022-09-24 08:51:05", "humidity": 112, "temperature": 4, 
"pressure": 462, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 
36058},
       {"message_id": 924085106380188, "client_id": "raspberrypi18", 
"timestamp": "2022-09-24 08:51:06", "humidity": 32, "temperature": 18, 
"pressure": 362, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 
36059},
       {"message_id": 924085107593821, "client_id": "raspberrypi6", 
"timestamp": "2022-09-24 08:51:07", "humidity": 91, "temperature": 6, 
"pressure": 1138, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 
36060},
       {"message_id": 924085108811805, "client_id": "raspberrypi3", 
"timestamp": "2022-09-24 08:51:08", "humidity": 102, "temperature": 3, 
"pressure": 1355, "pitch": "sample", "roll": "demo", "yaw": "test", "count": 
36062},
       ]
   df2 = spark.read.schema(topicSchema).json(sc.parallelize(raw_data))
   
   
   targetDBName = 'kafka_test'
   targetTableName = 'huditable_iot_fake_mor7'
   
   hudiStorageType = 'MERGE_ON_READ'
   
   targetPath = 's3://xxx/glue/streaming/' + targetTableName + '/'
   primaryKey = 'message_id'
   partitionKey = 'client_id'
   timestamp = 'timestamp'
   
   
   hudiConfigWithSync = {
       'className' : 'org.apache.hudi', 
       'hoodie.datasource.hive_sync.use_jdbc':'false', 
       'hoodie.datasource.write.partitionpath.field': partitionKey,
       'hoodie.datasource.write.recordkey.field': primaryKey, 
       'hoodie.datasource.write.precombine.field': timestamp,
       'hoodie.datasource.write.operation': 'upsert',
       'hoodie.table.name': targetTableName, 
       'hoodie.consistency.check.enabled': 'true', 
       'hoodie.datasource.write.hive_style_partitioning':'true',
       'hoodie.datasource.write.table.type': hudiStorageType,
       'hoodie.datasource.hive_sync.database': targetDBName, 
       'hoodie.datasource.hive_sync.table': targetTableName, 
       'hoodie.datasource.hive_sync.enable': 'true',
       'hoodie.datasource.hive_sync.partition_fields': partitionKey,
       'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.datasource.hive_sync.support_timestamp':'true',
       'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.SimpleKeyGenerator',
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2,
       'hoodie.datasource.hive_sync.mode':'hms',
       'hoodie.compact.inline': 'true',
       'hoodie.compact.inline.max.delta.commits':2,
   
   }
   
   
   
   def processBatch(data_frame, batchId):
       if (data_frame.count() > 0):
    
           spark_df = data_frame
           
spark_df.write.format('org.apache.hudi').options(**hudiConfigWithSync).mode('append').save(targetPath)
   
   processBatch(df2, 1)
   
   job.commit()
   ```
   
   
   3. It failed with the following error:
   ```
   
   2023-07-06 06:23:28,744 ERROR [main] glue.ProcessLauncher 
(Logging.scala:logError(77)): Error from Python:Traceback (most recent call 
last):
     File "/tmp/kafka-iot-hudi-fake.py", line 154, in <module>
       processBatch(df2, 1)
     File "/tmp/kafka-iot-hudi-fake.py", line 138, in processBatch
       
spark_df.write.format('org.apache.hudi').options(**hudiConfigWithSync).mode('append').save(targetPath)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
line 968, in save
       self._jwrite.save(path)
     File 
"/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 
1321, in __call__
       return_value = get_return_value(
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
190, in deco
       return f(*a, **kw)
     File 
"/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 
326, in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o155.save.
   : org.apache.hudi.exception.HoodieException: Could not sync using the meta 
sync class org.apache.hudi.hive.HiveSyncTool
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:60)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:673)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:672)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:672)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:759)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:350)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
        at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class org.apache.hudi.hive.HiveSyncTool
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:84)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
        ... 54 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        ... 56 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Got runtime 
exception when hive syncing
        at 
org.apache.hudi.hive.HiveSyncTool.initSyncClient(HiveSyncTool.java:118)
        at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:107)
        ... 61 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to create 
HiveMetaStoreClient
        at 
org.apache.hudi.hive.HoodieHiveSyncClient.<init>(HoodieHiveSyncClient.java:103)
        at 
org.apache.hudi.hive.HiveSyncTool.initSyncClient(HiveSyncTool.java:113)
        ... 62 more
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:243)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:413)
        at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:346)
        at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:326)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:295)
        at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:81)
        at 
org.apache.hudi.hive.HoodieHiveSyncClient.<init>(HoodieHiveSyncClient.java:87)
        ... 63 more
   Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
        at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50)
        at 
org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3856)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3836)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4098)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:255)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:238)
        ... 69 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
        ... 79 more
   Caused by: MetaException(message:Error getting metastore password: null)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6950)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:162)
        at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
        ... 84 more
   Caused by: java.lang.RuntimeException: Error getting metastore password: null
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:492)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139)
        at 
org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
        at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:627)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:593)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:587)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:654)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:430)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
        ... 88 more
   Caused by: java.io.IOException
        at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:968)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:487)
        ... 105 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getPassword(Hadoop23Shims.java:962)
        ... 106 more
   Caused by: java.io.IOException: Configuration problem with provider path.
        at 
org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2455)
        at 
org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2374)
        ... 111 more
   Caused by: java.io.IOException: No CredentialProviderFactory for 
testingforemptydefaultvalue in hadoop.security.credential.provider.path
        at 
org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:103)
        at 
org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2436)
        ... 112 more
   
   ```
   5.
   6.
   
   **Expected behavior**
   
   The same script works with hudi 0.12.2.
   
   **Environment Description**
   
   * Hudi version : 0.12.3
   
   * Spark version : 3.3.0
   
   * Hive version : n/a
   
   * Hadoop version :n/a
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   The error seems like not being able to get credential. 
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to