impothnis opened a new issue, #14625:
URL: https://github.com/apache/iceberg/issues/14625

   ### Apache Iceberg version
   
   1.6.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   When calling DataFrame.writeTo(...).using("iceberg").createOrReplace() 
against object storage (Google Cloud Storage, ADLS Gen2, Fabric OneLake) the 
write fails with FileNotFoundException while trying to read 
metadata/version-hint.text. Observed behavior: the operation raises the error 
and the job fails even though data + metadata files (including a 
version-hint.text file) are created at the table location. create() and 
append() succeed; the problem appears specific to createOrReplace() / 
createIfNotExists semantics on these object stores.
   
   > Environment
   
   Repository: apache/iceberg
   Spark: 3.5.1
   Scala: 2.12
   Catalog: HadoopCatalog (spark.sql.catalog.spark_catalog.type = "hadoop")
   Iceberg version: <please fill: e.g. 1.3.0>
   Hadoop version: <please fill>
   GCS connector version: <please fill>
   Azure ADLS Gen2 / OneLake client versions: <please fill>
   OS / JVM: <please fill>
   Language composition of repo: Java heavy (not strictly needed here)
   
   > How to Reproduce : 
   
   `// Storage auth/config (GCS / ADLS Gen2 / OneLake)
   spark.conf.set("fs.gs.impl", "<...>")
   spark.conf.set("fs.AbstractFileSystem.gs.impl", "<...>")
   spark.conf.set("fs.gs.project.id", "<...>")
   spark.conf.set("fs.gs.auth.type", "<...>")
   spark.conf.set("google.cloud.auth.service.account.enable", "<...>")
   spark.conf.set("google.cloud.auth.service.account.json.keyfile", "<...>")
   spark.conf.set("fs.gs.path.encoding", "<...>")
   
   // Iceberg catalog
   spark.conf.set("spark.sql.catalog.spark_catalog", 
"org.apache.iceberg.spark.SparkCatalog")
   spark.conf.set("spark.sql.catalog.spark_catalog.type", "hadoop")
   spark.conf.set("spark.sql.catalog.spark_catalog.warehouse", 
"<warehouse-path>")
   
   import spark.implicits._
   val data = Seq((4, "Liam"), (5, "Noel"))
   val df = data.toDF("id", "name")
   
   // Intended: create or replace table
   df.writeTo("iceberg_standalone").using("Iceberg").createOrReplace()`
   
   
   > Observed errors (examples) 
   
   **Google Cloud:** 
   Error reading version hint file <redacted>/.../metadata/version-hint.text 
java.io.FileNotFoundException: Item not found: 
'<redacted>/.../metadata/version-hint.text'. Note, it is possible that the live 
version is still available but the requested generation is deleted. at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageExceptions.createFileNotFoundException(...)
   
   **OneLake:** 
   Caused by: Operation failed: "The specified path does not exist.", 404, 
HEAD, 
https://onelake.dfs.fabric.microsoft.com/<redacted>/.../metadata/version-hint.text?...
   
   **ADLS Gen2:**
    WARN HadoopTableOperations: Error reading version hint file 
abfss://<redacted>/.../metadata/version-hint.text 
java.io.FileNotFoundException: Operation failed: "The specified path does not 
exist.", 404, HEAD, ...
   
   Important note Despite the exception and the write failing, the storage path 
ends up containing data and metadata files and a version-hint.text file with a 
valid value. create() and append() work as expected. The issue only appears for 
createOrReplace() when the table does not already exist.
   
   Expected behavior createOrReplace() should:
   
   create the table if it does not exist, then write the data, and return 
success; or
   if the table exists, atomically replace it as documented. It should not fail 
with a FileNotFoundException when the table does not already exist on object 
stores.
   Related
   
   > Possibly related:
   
    https://github.com/apache/iceberg/issues/1496
   Additional details / guesses
   
   It appears createOrReplace() attempts to read version-hint.text (or 
otherwise probes existing metadata) and that probe is returning a 
FileNotFound/404 for object store HEAD calls. That error seems to either be 
treated as a fatal I/O exception or is propagated up the call stack, causing 
the operation to fail even though later metadata is created successfully.
   The behavior may be caused by object store connector semantics for HEAD/GET 
on non-existent paths (404 vs returning an indicator) and how Iceberg's 
TableOperations handles those exceptions during createOrReplace.
   Workarounds tried
   
   Using create() and append() — both succeed.
   Manually checking for table existence in the integration layer and calling 
create() only when table absent — works, but the integration layer currently 
assumes createOrReplace() will handle that atomically.
   Request
   
   Can maintainers investigate functionality of createOrReplace() 
   
   
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [x] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to