atovk opened a new issue, #16299:
URL: https://github.com/apache/iceberg/issues/16299

   ### Problem
   
   Iceberg metadata refresh stops retrying when metadata loading fails with 
Iceberg `NotFoundException`, but some object-storage read paths propagate 
provider-specific missing-object exceptions directly. For stale catalog entries 
whose metadata file has been removed, this can cause the generic metadata 
refresh retry loop to keep retrying a permanent not-found condition.
   
   This was observed through Apache Gravitino with Aliyun OSS metadata 
locations: https://github.com/apache/gravitino/issues/11039
   
   ### Current behavior
   
   `BaseMetastoreTableOperations.refreshFromMetadataLocation` retries metadata 
reads and stops on `NotFoundException`.
   
   AWS S3, Azure ADLS, GCS, and Hadoop input streams already translate missing 
files/objects to `NotFoundException`, so stale metadata files fail fast.
   
   Aliyun OSS and Dell ECS have `exists()` paths that return `false` for 
missing objects, but their stream-read paths do not translate missing-object 
errors:
   
   - Aliyun OSS: `OSSInputStream.openStream` calls `client.getObject(...)` 
directly, so `NoSuchKey` / `NoSuchBucket` can remain an Aliyun SDK 
`OSSException`.
   - Dell ECS: `EcsSeekableInputStream` calls `client.readObjectStream(...)` 
directly, so HTTP 404 `S3Exception` can remain a Dell SDK exception.
   
   ### Expected behavior
   
   All Iceberg FileIO implementations should classify missing-object reads 
consistently as `NotFoundException`, allowing metadata refresh to stop retrying 
permanent not-found failures.
   
   ### Proposed fix
   
   Translate Aliyun OSS `NO_SUCH_KEY` / `NO_SUCH_BUCKET` and Dell ECS HTTP 404 
read failures to Iceberg `NotFoundException` in the read paths, with regression 
tests for both input streams.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to