felixhzhu commented on issue #4237:
URL: https://github.com/apache/amoro/issues/4237#issuecomment-4609859240

   > Thanks for the detail. Before treating this as established, I'd like to 
pin down the actual trigger, because line 95 is only reached when a HEAD on a 
**slash-less** key returns 200.
   > 
   > `directoryPath` comes from `TableFileUtil.getParent()` = `new 
Path(path).getParent().toString()`, which has **no trailing slash** (e.g. 
`.../op_time_hour=12`). For `S3FileIO`, `exists()` is `BaseS3File.exists()` → 
`headObject(key = uri().key())` on that exact key, returning `false` on 404. If 
it's 404 we return at `TableFileUtil:90` and never reach the throw.
   > 
   > That undercuts the three mechanisms on flat S3 (AWS/MinIO/Ceph):
   > 
   > * Directory markers are slash-_suffixed_ (`.../op_time_hour=12/`), so a 
HEAD on the slash-less key is a different key → 404. These stores also don't 
auto-create markers on PUT.
   > * `HeadObject` is an exact-key op — a non-empty prefix or a LIST-style 
rewrite doesn't make it 200. Do you have a concrete gateway impl that does?
   > 
   > I also ran the AWS "Create folder" recipe and couldn't reproduce: the 
console writes `.../<partition>/` (with slash) but the cleaner HEADs 
`.../<partition>` (no slash) → 404. So "100% on every object-store table" 
doesn't match what the code does on flat S3.
   > 
   > A HEAD returning 200 on a slash-less directory key would imply a 
hierarchical-namespace bucket rather than plain flat S3 — could you confirm the 
bucket type/mode? To make it reproducible, please share, for one failing 
partition dir, the raw `aws s3api head-object --key <slash-less-key>` response 
(200 vs 404) and the exact `io-impl` + storage config.
   
   You're right, and I want to retract the over-reach before anything else.
   
   ### Where I was wrong
   
   "100% on every object-store table" was sloppy. What I can actually claim is 
narrower:
   
   > **100% reproducible on our production environment**, with the 
`IllegalArgumentException: null` stack trace from 
`AuthenticatedFileIOAdapter:98` via `TableFileUtil.deleteEmptyDirectory:95` — 
i.e. a black-box symptom observation. **Not** "HEAD on a slash-less key returns 
200 on every S3-compatible store."
   
   Your code reading on strict S3 (`BaseS3File.exists()` → 
`headObject(uri().key())` on the slash-less key → 404 → early return at line 
90) is correct. On flat AWS S3 / MinIO / Ceph with no placeholder marker and no 
hierarchical-namespace mode, line 95 is unreachable. I should not have asserted 
otherwise without raw HEAD evidence.
   
   ### What I don't yet have
   
   I haven't captured the raw `aws s3api head-object` response on a failing 
partition dir on our side. Until I do, I'm not going to defend any specific 
mechanism. The plausible candidates I'd want to rule in/out — none of which I'm 
asserting as fact right now — are:
   
   - **(H1)** A 0-byte placeholder at the slash-less key (left over by some 
historical write tool / SDK / `aws s3api put-object --key path/to/dir/`).
   - **(H2)** Iceberg `S3InputFile.exists()` 1.6.1 falling through to a 
non-pure-HEAD path under some retry/transient condition.
   - **(H3)** **Bucket mode**: this is the one I think is most likely given 
your line of reasoning. Our prod runs against Tencent Cloud COS via the 
S3-compatible API, and COS does offer a hierarchical-namespace 
("directory-flavoured") bucket mode. I need to verify whether the affected 
bucket has it enabled — I don't want to guess.
   
   ### What I'll attach on the next reproduction
   
   Exactly your asks, plus a few I think are useful:
   
   - one specific failing partition path (anonymised);
   - raw response of `aws s3api head-object --key <slash-less-key>` (status 
code + body);
   - raw response of `aws s3api head-object --key <slash-suffixed-key>` 
(control);
   - `aws s3api list-objects-v2 --prefix <slash-less-key>` output (to see 
whether a placeholder marker actually exists);
   - the table's `io-impl` and full `s3.*` / `write.object-storage.*` 
properties;
   - AMS DEBUG-level log lines containing `directoryPath` from the failing run;
   - explicit confirmation of the bucket mode (regular S3-compatible vs. COS 
directory-mode / hierarchical-namespace).
   
   ### A repro that doesn't depend on the S3 mechanism
   
   While I get those artefacts, I'd like to separate the question "does this PR 
fix a real contract violation" from "what exactly makes `exists()` return true 
on our COS bucket". The first one I can answer deterministically without any 
object store at all, because the bug is a contract violation between 
`RollingFileCleaner` / `TableFileUtil.deleteEmptyDirectory` and 
`AuthenticatedFileIO`, not an S3 quirk:
   
   ```java
   @Test
   void deleteEmptyDirectory_throwsOnPureObjectStoreFileIO() {
     AuthenticatedFileIO io = mock(AuthenticatedFileIO.class);
     when(io.exists(anyString())).thenReturn(true);              // any path 
"exists"
     when(io.supportFileSystemOperations()).thenReturn(false);   // pure 
object-store contract
     when(io.asFileSystemIO()).thenCallRealMethod();             // hits 
Preconditions.checkArgument
   
     assertThatThrownBy(() ->
         TableFileUtil.deleteEmptyDirectory(io, "s3://bucket/db/t/data/p=1", 
Set.of()))
         .isInstanceOf(IllegalArgumentException.class)
         .hasMessage(null);
   }
   
   @Test
   void deleteEmptyDirectory_isNoOpAfterFix() {
     AuthenticatedFileIO io = mock(AuthenticatedFileIO.class);
     when(io.exists(anyString())).thenReturn(true);
     when(io.supportFileSystemOperations()).thenReturn(false);
   
     TableFileUtil.deleteEmptyDirectory(io, "s3://bucket/db/t/data/p=1", 
Set.of());
     verify(io, never()).asFileSystemIO();
   }
   ```
   
   The two preconditions of the repro (`exists() == true` + 
`supportFileSystemOperations() == false`) are **exactly the contract a pure 
object-store `FileIO` advertises**. Anything that can satisfy the first half — 
by any mechanism, HEAD/LIST/marker/hierarchical bucket — trips line 95. Happy 
to push these as part of this PR; they run in milliseconds, no Testcontainers, 
no S3.
   
   ### Why I still think the guard belongs at the entry point
   
   Even granting your reading on strict flat S3, the same 
`deleteEmptyDirectory` body calls `asFileSystemIO()` three more times after the 
line-90 `exists()` guard:
   
   ```java
   if (!io.exists(directoryPath)) return;                              // line 
90
   String parent = new Path(directoryPath).getParent().toString();
   if (!io.asFileSystemIO().isDirectory(directoryPath)                 // line 
95
       || exclude.contains(directoryPath) || exclude.contains(parent)) return;
   ...
   if (io.asFileSystemIO().isEmptyDirectory(directoryPath)) {           // line 
100
     io.asFileSystemIO().deletePrefix(directoryPath);                   // line 
101
     deleteEmptyDirectory(io, parent, exclude);
   }
   ```
   
   For the pre-fix code to be safe, every caller in every deployment in every 
bucket configuration would have to guarantee `exists()` returns false for every 
`directoryPath` the cleaner walks. The same author has already used the 
`supportFileSystemOperations()` guard at `BasicTableTrashManager.java:104` and 
`IcebergTableMaintainer.java:378`; the new `RollingFileCleaner` path is the 
only place that broke the convention. Fixing the entry point re-aligns it.
   
   ### Proposal
   
   1. I'll push the two unit tests above to this PR shortly so you can 
reproduce the contract violation deterministically without any S3 dependency.
   2. I'll follow up with the production HEAD artefacts (your full list above) 
once we re-reproduce on the COS bucket, and post them as a comment here so we 
can settle whether (H1)/(H2)/(H3) is the actual mechanism.
   3. Until that lands, please treat the mechanism question as **open on my 
side**, and the fix's justification as resting on (a) the deterministic 
contract repro and (b) the three downstream `asFileSystemIO()` calls at lines 
95/100/101 — not on my earlier "100% on every object-store table" wording, 
which I'm withdrawing.
   
   Does that work as a path forward?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to