Davis-Zhang-Onehouse opened a new pull request, #18817: URL: https://github.com/apache/hudi/pull/18817
### Describe the issue this Pull Request addresses Follow-up to #18814, which introduced `FSUtils.normalizeBasePathForLocking` for the implicit-key lock providers. `StorageBasedLockProvider` derives `lockFilePath` directly from the raw \`hoodie.base.path\` config value: \`\`\`java this.basePath = config.getHudiTableBasePath(); String lockFolderPath = StorageLockClient.getLockFolderPath(basePath); // new StoragePath(basePath, ".locks") this.lockFilePath = new StoragePath(lockFolderPath, DEFAULT_TABLE_LOCK_FILE_NAME).toString(); \`\`\` \`StoragePath\` absorbs trailing-slash drift, so \`s3://b/t\` and \`s3://b/t/\` already produce the same lock file. But two other classes of benign basePath drift do **not** get absorbed and route the writers to different lock files: | Drift | Same lock file today? | Reason | |---|---|---| | Trailing slash (\`s3://b/t\` vs \`s3://b/t/\`) | ✅ yes | \`StoragePath\` normalizes | | Scheme case (\`s3a://b/t\` vs \`s3://b/t\`) | ❌ **no** | Scheme is preserved verbatim through \`StoragePath\` | | Surrounding whitespace | ❌ **no** | Whitespace becomes part of the URI | Two writers that disagree on those benign formatting details acquire **different** lock files and lose mutual exclusion. Same class of bug as #18814, lower severity because the divergence is a visible second lock file rather than a silent hash split, but still a correctness issue. ### Summary and Changelog Route the raw basePath through \`FSUtils.normalizeBasePathForLocking\` (the helper introduced in #18814) before building \`lockFilePath\`. Using the same canonicalization API as the implicit-key LPs keeps the contract for "what counts as the same Hudi table" consistent across all lock providers. Also rename the stored field \`basePath\` → \`normalizedHudiTableBasePath\` so the name reflects what's actually stored. ### Impact - **Behavior**: \`StorageBasedLockProvider\` will produce the same \`lockFilePath\` for a Hudi table regardless of trailing-slash, surrounding whitespace, or \`s3a\`/\`s3\` scheme variations. No public API signature change. - **Performance**: Negligible — one extra \`trim()\` and a short scheme rewrite per LP construction. - **Compatibility (rollout caveat)**: Lock files keyed under the previous \`s3a://...\` form effectively orphan at deploy time, since the new code looks at \`s3://...\` for the same logical table. Operators upgrading should coordinate the deploy across all writers that share a Hudi table that uses this LP, or briefly quiesce writers while rolling out. Stale lock files can be cleaned up manually after the cutover. No rollout impact for callers that already pass \`s3://...\`. ### Risk Level low The change only affects how raw config strings are canonicalized before being fed to existing path-derivation code. Existing trailing-slash callers will see no change in derived lockFilePath (\`StoragePath\` already absorbed that). The only callers whose lock file moves are those that supplied an \`s3a://...\` basePath or surrounding whitespace. Mitigation: - Existing TestStorageBasedLockProvider suite continues to pass. - New tests cover the invariant directly: scheme drift, whitespace, and trailing-slash variants all produce the same lockFilePath. ### Documentation Update none — no user-facing config or website change. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable **Note**: This PR is stacked on top of #18814 — its diff currently includes the parent PR's changes until #18814 merges. The StorageBasedLockProvider-specific commit is the last one in the branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
