felixhzhu commented on issue #4237:
URL: https://github.com/apache/amoro/issues/4237#issuecomment-4609859240
> Thanks for the detail. Before treating this as established, I'd like to
pin down the actual trigger, because line 95 is only reached when a HEAD on a
**slash-less** key returns 200.
>
> `directoryPath` comes from `TableFileUtil.getParent()` = `new
Path(path).getParent().toString()`, which has **no trailing slash** (e.g.
`.../op_time_hour=12`). For `S3FileIO`, `exists()` is `BaseS3File.exists()` →
`headObject(key = uri().key())` on that exact key, returning `false` on 404. If
it's 404 we return at `TableFileUtil:90` and never reach the throw.
>
> That undercuts the three mechanisms on flat S3 (AWS/MinIO/Ceph):
>
> * Directory markers are slash-_suffixed_ (`.../op_time_hour=12/`), so a
HEAD on the slash-less key is a different key → 404. These stores also don't
auto-create markers on PUT.
> * `HeadObject` is an exact-key op — a non-empty prefix or a LIST-style
rewrite doesn't make it 200. Do you have a concrete gateway impl that does?
>
> I also ran the AWS "Create folder" recipe and couldn't reproduce: the
console writes `.../<partition>/` (with slash) but the cleaner HEADs
`.../<partition>` (no slash) → 404. So "100% on every object-store table"
doesn't match what the code does on flat S3.
>
> A HEAD returning 200 on a slash-less directory key would imply a
hierarchical-namespace bucket rather than plain flat S3 — could you confirm the
bucket type/mode? To make it reproducible, please share, for one failing
partition dir, the raw `aws s3api head-object --key <slash-less-key>` response
(200 vs 404) and the exact `io-impl` + storage config.
You're right, and I want to retract the over-reach before anything else.
### Where I was wrong
"100% on every object-store table" was sloppy. What I can actually claim is
narrower:
> **100% reproducible on our production environment**, with the
`IllegalArgumentException: null` stack trace from
`AuthenticatedFileIOAdapter:98` via `TableFileUtil.deleteEmptyDirectory:95` —
i.e. a black-box symptom observation. **Not** "HEAD on a slash-less key returns
200 on every S3-compatible store."
Your code reading on strict S3 (`BaseS3File.exists()` →
`headObject(uri().key())` on the slash-less key → 404 → early return at line
90) is correct. On flat AWS S3 / MinIO / Ceph with no placeholder marker and no
hierarchical-namespace mode, line 95 is unreachable. I should not have asserted
otherwise without raw HEAD evidence.
### What I don't yet have
I haven't captured the raw `aws s3api head-object` response on a failing
partition dir on our side. Until I do, I'm not going to defend any specific
mechanism. The plausible candidates I'd want to rule in/out — none of which I'm
asserting as fact right now — are:
- **(H1)** A 0-byte placeholder at the slash-less key (left over by some
historical write tool / SDK / `aws s3api put-object --key path/to/dir/`).
- **(H2)** Iceberg `S3InputFile.exists()` 1.6.1 falling through to a
non-pure-HEAD path under some retry/transient condition.
- **(H3)** **Bucket mode**: this is the one I think is most likely given
your line of reasoning. Our prod runs against Tencent Cloud COS via the
S3-compatible API, and COS does offer a hierarchical-namespace
("directory-flavoured") bucket mode. I need to verify whether the affected
bucket has it enabled — I don't want to guess.
### What I'll attach on the next reproduction
Exactly your asks, plus a few I think are useful:
- one specific failing partition path (anonymised);
- raw response of `aws s3api head-object --key <slash-less-key>` (status
code + body);
- raw response of `aws s3api head-object --key <slash-suffixed-key>`
(control);
- `aws s3api list-objects-v2 --prefix <slash-less-key>` output (to see
whether a placeholder marker actually exists);
- the table's `io-impl` and full `s3.*` / `write.object-storage.*`
properties;
- AMS DEBUG-level log lines containing `directoryPath` from the failing run;
- explicit confirmation of the bucket mode (regular S3-compatible vs. COS
directory-mode / hierarchical-namespace).
### A repro that doesn't depend on the S3 mechanism
While I get those artefacts, I'd like to separate the question "does this PR
fix a real contract violation" from "what exactly makes `exists()` return true
on our COS bucket". The first one I can answer deterministically without any
object store at all, because the bug is a contract violation between
`RollingFileCleaner` / `TableFileUtil.deleteEmptyDirectory` and
`AuthenticatedFileIO`, not an S3 quirk:
```java
@Test
void deleteEmptyDirectory_throwsOnPureObjectStoreFileIO() {
AuthenticatedFileIO io = mock(AuthenticatedFileIO.class);
when(io.exists(anyString())).thenReturn(true); // any path
"exists"
when(io.supportFileSystemOperations()).thenReturn(false); // pure
object-store contract
when(io.asFileSystemIO()).thenCallRealMethod(); // hits
Preconditions.checkArgument
assertThatThrownBy(() ->
TableFileUtil.deleteEmptyDirectory(io, "s3://bucket/db/t/data/p=1",
Set.of()))
.isInstanceOf(IllegalArgumentException.class)
.hasMessage(null);
}
@Test
void deleteEmptyDirectory_isNoOpAfterFix() {
AuthenticatedFileIO io = mock(AuthenticatedFileIO.class);
when(io.exists(anyString())).thenReturn(true);
when(io.supportFileSystemOperations()).thenReturn(false);
TableFileUtil.deleteEmptyDirectory(io, "s3://bucket/db/t/data/p=1",
Set.of());
verify(io, never()).asFileSystemIO();
}
```
The two preconditions of the repro (`exists() == true` +
`supportFileSystemOperations() == false`) are **exactly the contract a pure
object-store `FileIO` advertises**. Anything that can satisfy the first half —
by any mechanism, HEAD/LIST/marker/hierarchical bucket — trips line 95. Happy
to push these as part of this PR; they run in milliseconds, no Testcontainers,
no S3.
### Why I still think the guard belongs at the entry point
Even granting your reading on strict flat S3, the same
`deleteEmptyDirectory` body calls `asFileSystemIO()` three more times after the
line-90 `exists()` guard:
```java
if (!io.exists(directoryPath)) return; // line
90
String parent = new Path(directoryPath).getParent().toString();
if (!io.asFileSystemIO().isDirectory(directoryPath) // line
95
|| exclude.contains(directoryPath) || exclude.contains(parent)) return;
...
if (io.asFileSystemIO().isEmptyDirectory(directoryPath)) { // line
100
io.asFileSystemIO().deletePrefix(directoryPath); // line
101
deleteEmptyDirectory(io, parent, exclude);
}
```
For the pre-fix code to be safe, every caller in every deployment in every
bucket configuration would have to guarantee `exists()` returns false for every
`directoryPath` the cleaner walks. The same author has already used the
`supportFileSystemOperations()` guard at `BasicTableTrashManager.java:104` and
`IcebergTableMaintainer.java:378`; the new `RollingFileCleaner` path is the
only place that broke the convention. Fixing the entry point re-aligns it.
### Proposal
1. I'll push the two unit tests above to this PR shortly so you can
reproduce the contract violation deterministically without any S3 dependency.
2. I'll follow up with the production HEAD artefacts (your full list above)
once we re-reproduce on the COS bucket, and post them as a comment here so we
can settle whether (H1)/(H2)/(H3) is the actual mechanism.
3. Until that lands, please treat the mechanism question as **open on my
side**, and the fix's justification as resting on (a) the deterministic
contract repro and (b) the three downstream `asFileSystemIO()` calls at lines
95/100/101 — not on my earlier "100% on every object-store table" wording,
which I'm withdrawing.
Does that work as a path forward?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]