inkerinmaa opened a new issue, #14817:
URL: https://github.com/apache/iceberg/issues/14817

   ### Apache Iceberg version
   
   1.10.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hello.
   My setup - Spark -> Iceberg Rest (in docker) -> S3
   When I use MinIO as S3 - all works fine. Swithing to Cloud (S3 compatible) - 
can't make it working. Cloud S3 requires checksum validation, aws cli is 
working with these parameters:
   `request_checksum_calculation = WHEN_REQUIRED
   response_checksum_validation = WHEN_REQUIRED`
   For Iceberg Rest I added env variable to docker with 
apache/iceberg-rest-fixture:
   `environment:
         - AWS_REQUEST_CHECKSUM_CALCULATION=when_required`
   And from Spark I am able now to create Tables, Namespaces - actions, that 
utilize Iceberg Rest.
   But I cannot Insert new data, even though I added in Spark
   `spark.sql.catalog.rest.s3.checksum-enabled: "true"`
   I am getting an error:
   `SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 
failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) 
(10.42.0.130 executor 1): java.io.UncheckedIOException: Failed to close current 
writer`
   
   My Spark config:
   `apiVersion: spark.apache.org/v1
   kind: SparkApplication
   metadata:
     name: spark-connect-server
   spec:
     mainClass: "org.apache.spark.sql.connect.service.SparkConnectServer"
     sparkConf:
       spark.jars.packages: 
org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0
       
       # ===== IVY CONFIGURATION =====
       spark.jars.ivy: /tmp/.ivy2
       spark.jars.repositories: https://repo1.maven.org/maven2/  
       
       # ===== ICEBERG CONFIG =====
       spark.sql.extensions: 
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
       spark.sql.defaultCatalog: rest
   
       spark.sql.catalog.rest: org.apache.iceberg.spark.SparkCatalog
       spark.sql.catalog.rest.type: rest
       spark.sql.catalog.rest.uri: http://testsrv01:8181
       spark.sql.catalog.rest.warehouse: s3://warehouse/
       spark.sql.catalog.rest.io-impl: org.apache.iceberg.aws.s3.S3FileIO
       spark.sql.catalog.rest.s3.region: "us-east-1"
       spark.sql.catalog.rest.s3.endpoint: https://s3.yyyyy.cloud:443
       spark.sql.catalog.rest.s3.access-key-id: test:[email protected]
       spark.sql.catalog.rest.s3.secret-access-key: test
       spark.sql.catalog.rest.s3.path-style-access: "true"
       spark.sql.catalogImplementation: in-memory
       
       spark.dynamicAllocation.enabled: "true"
       spark.dynamicAllocation.shuffleTracking.enabled: "true"
       spark.dynamicAllocation.minExecutors: "1"
       spark.dynamicAllocation.maxExecutors: "1"
       spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
       spark.kubernetes.container.image: "apache/spark:4.0.1"
       spark.kubernetes.driver.pod.excludedFeatureSteps: 
"org.apache.spark.deploy.k8s.features.KerberosConfDriverFeatureStep"
       spark.sql.catalog.rest.s3.checksum-enabled: "true"
       
       spark.driver.extraJavaOptions: "-Daws.region=us-east-1"
       spark.executor.extraJavaOptions: "-Daws.region=us-east-1"
       
     applicationTolerations:
       resourceRetainPolicy: OnFailure
     runtimeVersions:
       scalaVersion: "2.13"
       sparkVersion: "4.0.1"`
   
   So, error appeared after switching to the Cloud S3 with checksum settings 
and therefore conclusion is that it is the root cause of the error. And 
available setting `spark.sql.catalog.rest.s3.checksum-enabled: "true"` has no 
effect on it. Or there is another way to make it working?
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to