hhhizzz opened a new issue, #8261:
URL: https://github.com/apache/paimon/issues/8261

   ### Search before asking
   
   - [x] I searched in the issues and found nothing similar.
   
   ### Paimon version
   
   Current master / 1.5-SNAPSHOT.
   
   The relevant code path also exists in the current master branch:
   
   - `RenamingSnapshotCommit.commit` checks `fileIO.exists(newSnapshotPath)` 
before committing a snapshot.
   - `HintFileUtils.findLatest` checks `fileIO.exists(snapshot-(latest + 1))` 
after reading `snapshot/LATEST`.
   - `RenamingSnapshotCommit` writes snapshots through 
`FileIO.tryToWriteAtomic`, which creates a temporary object with 
`overwrite=false` before rename.
   
   ### Compute Engine
   
   Flink sink / Paimon FileStore commit using the S3 FileIO backed by Hadoop 
S3A.
   
   This was observed with an S3-compatible object store, but the behavior is 
consistent with AWS S3 permission semantics as well: if a principal does not 
have `s3:ListBucket`, probing a missing object with HEAD/getFileStatus can 
return 403 instead of 404.
   
   ### Minimal reproduce step
   
   1. Create an S3 bucket and warehouse path, for example:
   
   ```text
   s3://paimon-test/warehouse
   ```
   
   2. Create credentials for a writer which can operate on known table objects, 
but does not have bucket listing permission. For AWS IAM this is equivalent to 
allowing object actions under the table/warehouse path and omitting 
`s3:ListBucket` on the bucket:
   
   ```json
   {
     "Version": "2012-10-17",
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:PutObject",
           "s3:DeleteObject"
         ],
         "Resource": "arn:aws:s3:::paimon-test/warehouse/*"
       }
     ]
   }
   ```
   
   For MinIO/RGW/other S3-compatible storage, use the equivalent policy: 
object-level read/write/delete is allowed for the warehouse/table prefix, while 
bucket/prefix list is denied or omitted.
   
   3. Configure a Paimon catalog on this warehouse with those credentials, for 
example in Flink SQL:
   
   ```sql
   CREATE CATALOG paimon_s3 WITH (
     'type' = 'paimon',
     'warehouse' = 's3://paimon-test/warehouse',
     's3.endpoint' = '<s3-compatible-endpoint>',
     's3.access-key' = '<access-key>',
     's3.secret-key' = '<secret-key>',
     's3.path.style.access' = 'true'
   );
   
   USE CATALOG paimon_s3;
   CREATE DATABASE IF NOT EXISTS default;
   CREATE TABLE default.t (
     id INT,
     v STRING
   );
   
   INSERT INTO default.t VALUES (1, 'a');
   ```
   
   The same issue can also be reproduced by appending to an existing Paimon 
table with a Flink sink using the same S3 credentials.
   
   4. The commit path probes snapshot metadata paths whose existence is already 
known by the commit protocol to be either a target object or the next snapshot 
object. Without `ListBucket`, S3A translates the missing-object status probe 
into an access denied error, for example:
   
   ```text
   java.nio.file.AccessDeniedException: 
s3://paimon-test/warehouse/default.db/t/snapshot/snapshot-1:
   getFileStatus on s3://paimon-test/warehouse/default.db/t/snapshot/snapshot-1:
   software.amazon.awssdk.services.s3.model.S3Exception: Forbidden (Service: 
S3, Status Code: 403)
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:270)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4114)
   ```
   
   Depending on where the first probe happens, the failing path may also be the 
next snapshot file, a temporary snapshot file created by `tryToWriteAtomic`, or 
the table's `snapshot/` metadata directory.
   
   ### What doesn't meet your expectations?
   
   For a normal sink commit on object stores, Paimon should not require bucket 
listing permission just to commit known snapshot metadata objects when an 
external lock is used.
   
   Object-level `GetObject`/`PutObject`/`DeleteObject` under the table path 
should be enough for the normal snapshot commit path. Listing the snapshot 
directory is still expected for operations that need to enumerate snapshots, 
but the append/commit fast path should avoid missing-object 
`exists`/`getFileStatus` probes on S3 because those probes require `ListBucket` 
to distinguish missing objects from forbidden objects.
   
   This matters for production deployments where writers are intentionally 
scoped to a table or warehouse prefix and are not allowed to list the whole 
bucket.
   
   ### Anything else?
   
   The problematic object-store interactions appear to be:
   
   1. `RenamingSnapshotCommit.commit` checks `fileIO.exists(newSnapshotPath)` 
before entering the commit callable. On S3, checking a missing `snapshot-N` can 
require `ListBucket`.
   2. `HintFileUtils.findLatest` reads `snapshot/LATEST`, then checks whether 
`snapshot-(latest + 1)` exists. If that next snapshot does not exist and the 
principal lacks `ListBucket`, this can fail with 403.
   3. `FileIO.tryToWriteAtomic` writes a temp file with `overwrite=false`. On 
S3A this can still trigger status checks for a missing object before create.
   
   A possible fix is to avoid these missing-object status checks for object 
stores in the snapshot commit path, trust committed hint files where 
appropriate, and write snapshot metadata files through a path that does not 
call Hadoop S3A `create(..., overwrite=false)` for known snapshot metadata 
objects.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to