akshayar opened a new issue, #4804: URL: https://github.com/apache/iceberg/issues/4804
EMR Version : emr-6.5.0-latest Iceberg Version : https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.13.1/iceberg-spark3-runtime-0.13.1.jar https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-extensions/0.13.1/iceberg-spark3-extensions-0.13.1.jar I am trying to run Iceberg streaming ingestion application which consumes from Kinesis Data Stream and ingests data to S3. When I run on EMR on EKS with EC2 nodes it work. However when I run on EMR on EKS with Fargate it fails with this error - Exception in thread "main" software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from any of the providers in the chain AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileCredentialsProvider(), ContainerCredentialsProvider(), InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., WebIdentityTokenCredentialsProvider(): Multiple HTTP implementations were found on the classpath. To avoid non-deterministic loading implementations, pleas e explicitly provide an HTTP client via the client builders, set the software.amazon.awssdk.http.service.impl system property with the FQCN of the HTTP service to use as the default, or remove all but one HTTP implementation from the classpath, ProfileCredentialsProvider(): Profile file contained no credentials for profile 'default': ProfileFile(profiles=[]), ContainerCredentialsProvider(): Cannot fetch credentials from container - neither AWS_CONTAINER_CREDENTIALS_FULL_URI or AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., InstanceProfileCredentialsProvider(): Unable to load credentials from service endpoint.] at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:112) at software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45) at software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:104) at software.amazon.awssdk.awscore.client.handler.AwsClientHandlerUtils.createExecutionContext(AwsClientHandlerUtils.java:76) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.createExecutionContext(AwsSyncClientHandler.java:68) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:97) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:167) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:94) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55) at software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:7220) at org.apache.iceberg.aws.glue.GlueTableOperations.getGlueTable(GlueTableOperations.java:162) at org.apache.iceberg.aws.glue.GlueTableOperations.doRefresh(GlueTableOperations.java:91) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:95) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:78) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:42) at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344) at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342) at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325) at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108) at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62) at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:161) at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:488) at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:135) at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:92) at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:119) The job run details are - { "jobRun": { "id": "0000000306mnu3nsmd3", "name": "iceberg-job", "virtualClusterId": "bf8egc23bcgkw0ac9hitobvu0", "arn": "arn:aws:emr-containers:ap-south-1:ACCOUNT_ID:/virtualclusters/bf8egc23bcgkw0ac9hitobvu0/jobruns/0000000306mnu3nsmd3", "state": "FAILED", "clientToken": "73aa6347-cf3b-4a20-9898-97f108639b85", "executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/emr-on-eks-job-role", "releaseLabel": "emr-6.5.0-latest", "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.kubernetes.driver.label.type": "etl", "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.kubernetes.executor.label.type": "etl", "spark.sql.catalog.my_catalog.warehouse": "s3://s3-data-bucket/iceberg", "spark.sql.catalog.my_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog", "spark.sql.catalog.my_catalog": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.my_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO" } } ], "monitoringConfiguration": { "persistentAppUI": "ENABLED", "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-on-eks/eksworkshop-eksctl", "logStreamNamePrefix": "iceberg-job" }, "s3MonitoringConfiguration": { "logUri": "s3://s3-data-bucket/hudi/logs/" } } }, "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://s3-data-bucket/spark-structured-streaming-kinesis-iceberg_2.12-1.0.jar", "entryPointArguments": [ "s3-data-bucket", "data-stream-ingest-json", "ap-south-1", "my_catalog.demoiceberg.eks_fargate_iceberg_kinesis", "LATEST" ], "sparkSubmitParameters": "--class kinesis.iceberg.latefile.SparkKinesisConsumerIcebergProcessor --jars https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.13.1/iceberg-spark3-runtime-0.13.1.jar,https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-extensions/0.13.1/iceberg-spark3-extensions-0.13.1.jar,https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kinesis-asl_2.12/3.1.1/spark-streaming-kinesis-asl_2.12-3.1.1.jar,s3://'s3-data-bucket'/spark-sql-kinesis_2.12-1.2.1_spark-3.0-SNAPSHOT.jar,https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.15.40/bundle-2.15.40.jar,https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.15.40/url-connection-client-2.15.40.jar" } }, "createdAt": "2022-05-18T10:38:37+00:00", "createdBy": "arn:aws:iam::ACCOUNT_ID:user/username", "finishedAt": "2022-05-18T10:42:15+00:00", "stateDetails": "Jobrun failed. Main Spark container terminated with errors. Please refer logs uploaded to S3/CloudWatch based on your monitoring configuration.", "failureReason": "USER_ERROR", "tags": {} } } The code in scale writes using following lines - val query = (jsonDF.writeStream .format("iceberg") // .format("console") .outputMode("append") .trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES)) .option("path", tableName) .option("fanout-enabled", "true") .option("checkpointLocation", checkpoint_path) .start()) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
