LuciferYang opened a new pull request, #64768:
URL: https://github.com/apache/doris/pull/64768
### What problem does this PR solve?
Issue Number: close #64767
Problem Summary:
`PathUtils.equalsIgnoreSchemeIfOneIsS3(p1, p2)` (used by `HMSTransaction` to
decide whether a Hive commit needs a rename) compared paths inconsistently
across its two branches:
- same scheme โ full-string `equalsIgnoreCase` (trailing slash significant,
case-insensitive);
- cross-scheme with `s3` โ normalized authority+path via `Objects.equals`
(trailing slash stripped, case-sensitive).
So the result for one URI depended on the *other* URI's scheme, and
same-scheme comparisons wrongly ignored case (S3 keys are case-sensitive).
This PR unifies both branches into one rule: when the schemes are equal
(case-insensitively, per RFC 3986 ยง3.1) **or** one side is `s3`, compare only
the authority (bucket/host) and path โ scheme ignored, **trailing slashes
insignificant**, **case-sensitive** on the raw (percent-encoded) components;
otherwise the locations are not equal. This matches the original `normalize()`
intent and the caller's "no rename when the location is identical" comment, and
applies the slash/case handling consistently regardless of whether the two
schemes match.
Inputs that are malformed for object storage fall back to exact string
comparison so they cannot spuriously match: opaque URIs (`s3:bucket/key`),
scheme-with-null-authority triple-slash forms (`s3:///path`),
authority-with-null-scheme network-path references (`//bucket/path`), and parse
failures. Percent-encoded slashes (`%2F`) stay distinct from real path
separators.
The change was hardened via a multi-persona adversarial review loop run to
convergence (5 consecutive clean rounds); the extra rounds mainly added test
coverage.
### Release note
None
### Check List (For Author)
- Test
- [x] Unit Test (`PathUtilsTest`, 23 cases covering the consistency
contract plus opaque/encoded/triple-slash/network-path/null/scheme-family edge
cases)
- Behavior changed:
- [x] Yes. `equalsIgnoreSchemeIfOneIsS3` now treats trailing slashes as
insignificant and the authority+path comparison as case-sensitive
**consistently** for same-scheme and cross-scheme inputs. For realistic
fully-qualified Hive S3/OSS locations the rename decision is unchanged; the
difference only appears for trailing-slash-only or case-only differences and
for malformed inputs.
- Does this need documentation?
- [x] No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]