sayedabdallah commented on issue #11850: URL: https://github.com/apache/hudi/issues/11850#issuecomment-2961470661
Hi @alberttwong , I am getting a similar exception on EMR Serverless 7.5 which uses Apache Spark 3.5.2 which in turn uses hadoop 3.3.4, https://github.com/apache/spark/blob/v3.5.2/pom.xml#L125, I am also trying to access data from different AWS Account using IAM Role by setting `.config("spark.hadoop.fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider")` in my spark code. the exception is ``` py4j.protocol.Py4JJavaError: An error occurred while calling o151.sql. : org.sparkproject.guava.util.concurrent.UncheckedExecutionException: org.apache.hudi.exception.HoodieException: Unable to create org.apache.hudi.storage.hadoop.HoodieHadoopStorage at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:212) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:277) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:345) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:329) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:199) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:77) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:199) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387) at org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.mapChildren(basicLogicalOperators.scala:2051) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387) at org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.mapChildren(basicLogicalOperators.scala:2051) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387) at org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:99) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:128) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:125) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:34) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:85) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:84) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:34) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:329) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:249) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:239) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:236) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:319) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:368) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:319) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:309) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:309) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:195) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:191) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeSameContext(Analyzer.scala:303) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$2(Analyzer.scala:299) at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:216) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:299) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:245) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:270) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:360) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:269) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:93) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:711) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:90) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:82) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:639) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:660) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: org.apache.hudi.exception.HoodieException: Unable to create org.apache.hudi.storage.hadoop.HoodieHadoopStorage at org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:44) at org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:34) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:107) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:357) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:268) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:255) at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) ... 102 more Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.storage.hadoop.HoodieHadoopStorage at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75) at org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:41) ... 113 more Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:73) ... 114 more Caused by: org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:118) at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:109) at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.<init>(HoodieHadoopStorage.java:63) ... 120 more Caused by: org.apache.hadoop.fs.s3a.impl.InstantiationIOException: `s3://MY_DATA_PATH_HERE': Class org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider instantiation exception java.lang.NoClassDefFoundError: software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder (configuration key fs.s3a.aws.credentials.provider): software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder at org.apache.hadoop.fs.s3a.impl.InstantiationIOException.instantiationException(InstantiationIOException.java:178) at org.apache.hadoop.fs.s3a.S3AUtils.getInstanceFromReflection(S3AUtils.java:695) at org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.createAWSV2CredentialProvider(CredentialProviderListFactory.java:307) at org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.buildAWSProviderList(CredentialProviderListFactory.java:253) at org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.createAWSCredentialProviderList(CredentialProviderListFactory.java:145) at org.apache.hadoop.fs.s3a.S3AFileSystem.createClientManager(S3AFileSystem.java:1186) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:751) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3662) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3763) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3714) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:564) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:374) at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:116) ... 122 more Caused by: java.lang.NoClassDefFoundError: software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder at org.apache.hadoop.fs.s3a.auth.STSClientFactory.getSTSEndpoint(STSClientFactory.java:170) at org.apache.hadoop.fs.s3a.auth.STSClientFactory.builder(STSClientFactory.java:154) at org.apache.hadoop.fs.s3a.auth.STSClientFactory.builder(STSClientFactory.java:110) at org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider.<init>(AssumedRoleCredentialProvider.java:144) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at org.apache.hadoop.fs.s3a.S3AUtils.getInstanceFromReflection(S3AUtils.java:661) ... 134 more Caused by: java.lang.ClassNotFoundException: software.amazon.awssdk.thirdparty.org.apache.http.client.utils.URIBuilder at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 144 more ``` I am not sure how to solve this issue seeking any support Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
