danny0405 commented on code in PR #18035:
URL: https://github.com/apache/hudi/pull/18035#discussion_r2757202418
##########
.claude/settings.local.json:
##########
@@ -0,0 +1,18 @@
+{
+ "permissions": {
+ "allow": [
+ "Bash(mvn-package:*)",
+ "Bash(alias mvn-package)",
+ "Bash(java -version:*)",
+ "Bash(/usr/libexec/java_home:*)",
+ "Bash(export
JAVA_HOME=/opt/homebrew/Cellar/openjdk@11/11.0.29/libexec/openjdk.jdk/Contents/Home)",
+ "Bash(gh pr create:*)",
+ "Bash(gh issue list:*)",
+ "Bash(gh search:*)",
+ "Bash(gh issue view:*)",
+ "Bash(gh api:*)",
+ "Bash(gh pr comment 9035 --repo apache/hudi --body \"$\\(cat
<<''EOF''\nHi @nbalajee @yihua,\n\nWe are interested in helping revive this PR.
This feature addresses a critical issue that continues to affect Hudi users in
production environments.\n\n### Why This Is Important\n\nThe Spark task/stage
retry problem causes real data quality issues:\n- **Duplicate data files** left
on the dataset when stray executors complete writes after the driver has
finalized the commit\n- **Duplicate records** visible to query engines, causing
data quality issues\n- **Partial/corrupt Parquet files** when speculative tasks
are killed mid-write\n\nRelated issues that would benefit from this fix:\n-
#9615 - Broken parquet files with speculation execution enabled\n- #697 - Spark
retry problem causing duplicate files \n- #1764 - Commits stay INFLIGHT due to
duplicate file cleanup failures\n- #8674 - Parquet file length too low \\(0
bytes\\)\n\n### Current State\n\nI noticed the PR currently has merg
e conflicts \\(`mergeable_state: dirty`\\) and the last CI run had failures.
We would be happy to help:\n\n1. **Rebase the PR** onto the latest master to
resolve conflicts\n2. **Fix CI failures** and ensure tests pass\n3. **Address
any outstanding review feedback**\n\n### Questions\n\n1. Are there any blocking
concerns or design decisions that prevented this from moving forward?\n2. Would
it be acceptable for us to create a new PR based on this work with the
necessary updates?\n3. Is there any overlap or conflict with the RFC in #11593
\\(Robust Spark Writes\\)?\n\nWe have been running a similar implementation
internally and can validate that the approach works well in production. Looking
forward to helping get this merged.\n\ncc: @nsivabalan @codope\nEOF\n\\)\")",
Review Comment:
should this change be excluded?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]