sayedabdallah commented on issue #11850:
URL: https://github.com/apache/hudi/issues/11850#issuecomment-2961470661

   Hi @alberttwong ,
   I am getting a similar exception on EMR Serverless 7.5 which uses Apache 
Spark 3.5.2 which in turn uses hadoop 3.3.4, 
https://github.com/apache/spark/blob/v3.5.2/pom.xml#L125, 
   
   I am also trying to access data from different AWS Account using IAM Role by 
setting 
   
   
`.config("spark.hadoop.fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider")`
 in my spark code.
   
   the exception is 
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o151.sql.
   : org.sparkproject.guava.util.concurrent.UncheckedExecutionException: 
org.apache.hudi.exception.HoodieException: Unable to create 
org.apache.hudi.storage.hadoop.HoodieHadoopStorage
        at 
org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
        at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
        at 
org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
        at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:212)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:277)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:345)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:329)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:199)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:77)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:199)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387)
        at 
org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.mapChildren(basicLogicalOperators.scala:2051)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387)
        at 
org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.mapChildren(basicLogicalOperators.scala:2051)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1388)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1387)
        at 
org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:99)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:204)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:128)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:125)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:85)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:84)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:329)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:249)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:239)
        at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:236)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:319)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:368)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:319)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:309)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:309)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:195)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:191)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeSameContext(Analyzer.scala:303)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$2(Analyzer.scala:299)
        at 
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:216)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:299)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:245)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:270)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:360)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:269)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:93)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:711)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:93)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:90)
        at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:82)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:639)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:660)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to create 
org.apache.hudi.storage.hadoop.HoodieHadoopStorage
        at 
org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:44)
        at 
org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:34)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:107)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:357)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:268)
        at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:255)
        at 
org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
        at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
        at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
        at 
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
        at 
org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
        ... 102 more
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class org.apache.hudi.storage.hadoop.HoodieHadoopStorage
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75)
        at 
org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:41)
        ... 113 more
   Caused by: java.lang.reflect.InvocationTargetException
        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
        at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:73)
        ... 114 more
   Caused by: org.apache.hudi.exception.HoodieIOException: Failed to get 
instance of org.apache.hadoop.fs.FileSystem
        at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:118)
        at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:109)
        at 
org.apache.hudi.storage.hadoop.HoodieHadoopStorage.<init>(HoodieHadoopStorage.java:63)
        ... 120 more
   Caused by: org.apache.hadoop.fs.s3a.impl.InstantiationIOException: 
`s3://MY_DATA_PATH_HERE': Class 
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider instantiation 
exception java.lang.NoClassDefFoundError: 
software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder 
(configuration key fs.s3a.aws.credentials.provider): 
software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder
        at 
org.apache.hadoop.fs.s3a.impl.InstantiationIOException.instantiationException(InstantiationIOException.java:178)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.getInstanceFromReflection(S3AUtils.java:695)
        at 
org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.createAWSV2CredentialProvider(CredentialProviderListFactory.java:307)
        at 
org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.buildAWSProviderList(CredentialProviderListFactory.java:253)
        at 
org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.createAWSCredentialProviderList(CredentialProviderListFactory.java:145)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.createClientManager(S3AFileSystem.java:1186)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:751)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3662)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:171)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3763)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3714)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:564)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:374)
        at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:116)
        ... 122 more
   Caused by: java.lang.NoClassDefFoundError: 
software/amazon/awssdk/thirdparty/org/apache/http/client/utils/URIBuilder
        at 
org.apache.hadoop.fs.s3a.auth.STSClientFactory.getSTSEndpoint(STSClientFactory.java:170)
        at 
org.apache.hadoop.fs.s3a.auth.STSClientFactory.builder(STSClientFactory.java:154)
        at 
org.apache.hadoop.fs.s3a.auth.STSClientFactory.builder(STSClientFactory.java:110)
        at 
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider.<init>(AssumedRoleCredentialProvider.java:144)
        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
        at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.getInstanceFromReflection(S3AUtils.java:661)
        ... 134 more
   Caused by: java.lang.ClassNotFoundException: 
software.amazon.awssdk.thirdparty.org.apache.http.client.utils.URIBuilder
        at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
        at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
        ... 144 more
   ```
   
   I am not sure how to solve this issue seeking any support 
   
   Thanks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to