zclllyybb commented on issue #64767: URL: https://github.com/apache/doris/issues/64767#issuecomment-4785729065
Breakwater-GitHub-Analysis-Slot: slot_bc1fd4b9373a This content is generated by AI for reference only. Initial triage: this looks like a real consistency bug in current `master`. I refreshed `apache/doris` `upstream/master` to `e5f3badd0109e312167f242df5aa53adb86806d8` and checked the referenced code path. In `fe/fe-foundation/src/main/java/org/apache/doris/foundation/util/PathUtils.java`, `equalsIgnoreSchemeIfOneIsS3` currently has two different equality contracts: - Same-scheme URIs go through a full-string `equalsIgnoreCase` comparison, so object key case is ignored and trailing slashes remain significant. - Cross-scheme URIs where either side is `s3` compare normalized authority and path with `Objects.equals`, so the comparison is case-sensitive and strips trailing slashes. That matches the issue's reproduction. It also matters to production behavior: the production use I found is `HMSTransaction.prepareInsertExistingTable` in `fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSTransaction.java`, where the result directly drives `needRename = !PathUtils.equalsIgnoreSchemeIfOneIsS3(targetPath, writePath)`. Therefore: - Same-scheme trailing-slash-only differences can trigger an unnecessary rename even though the cross-scheme `s3` path treats the same location as equal. - Same-scheme case-only differences such as `s3://bucket/A` versus `s3://bucket/a` can be incorrectly treated as equal, which is unsafe for object-storage keys. I also found the linked public fix PR: https://github.com/apache/doris/pull/64768. It is currently open as a draft, based on the same `master` head, and its patch changes the utility to use one structural comparison rule for same-scheme and `s3` cross-scheme cases. The added `PathUtilsTest` cases cover the reported trailing-slash consistency issue, case-sensitive path/authority comparisons, malformed object-storage URI forms, encoded slashes, query/fragment distinctions, and `s3a`/`s3n` behavior. Suggested next steps: 1. Review PR #64768 as the likely direct fix for this issue. 2. Keep the utility-level regression tests for both reported examples: same-scheme trailing slash and same-scheme case-only path differences. 3. If maintainers want stronger caller coverage, add a narrow FE test around the Hive insert path or rename-decision input pair to ensure `targetPath` and `writePath` follow the unified location-equality rule. Missing information is not blocking the code-level conclusion here because the logic bug is visible from the current source and the issue includes direct method-level reproductions. A real-world incident assessment would still need the exact table location, write path, storage scheme pair, Doris build SHA, and FE commit/rename logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
