danny0405 commented on code in PR #18035:
URL: https://github.com/apache/hudi/pull/18035#discussion_r2757202418


##########
.claude/settings.local.json:
##########
@@ -0,0 +1,18 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(mvn-package:*)",
+      "Bash(alias mvn-package)",
+      "Bash(java -version:*)",
+      "Bash(/usr/libexec/java_home:*)",
+      "Bash(export 
JAVA_HOME=/opt/homebrew/Cellar/openjdk@11/11.0.29/libexec/openjdk.jdk/Contents/Home)",
+      "Bash(gh pr create:*)",
+      "Bash(gh issue list:*)",
+      "Bash(gh search:*)",
+      "Bash(gh issue view:*)",
+      "Bash(gh api:*)",
+      "Bash(gh pr comment 9035 --repo apache/hudi --body \"$\\(cat 
<<''EOF''\nHi @nbalajee @yihua,\n\nWe are interested in helping revive this PR. 
This feature addresses a critical issue that continues to affect Hudi users in 
production environments.\n\n### Why This Is Important\n\nThe Spark task/stage 
retry problem causes real data quality issues:\n- **Duplicate data files** left 
on the dataset when stray executors complete writes after the driver has 
finalized the commit\n- **Duplicate records** visible to query engines, causing 
data quality issues\n- **Partial/corrupt Parquet files** when speculative tasks 
are killed mid-write\n\nRelated issues that would benefit from this fix:\n- 
#9615 - Broken parquet files with speculation execution enabled\n- #697 - Spark 
retry problem causing duplicate files  \n- #1764 - Commits stay INFLIGHT due to 
duplicate file cleanup failures\n- #8674 - Parquet file length too low \\(0 
bytes\\)\n\n### Current State\n\nI noticed the PR currently has merg
 e conflicts \\(`mergeable_state: dirty`\\) and the last CI run had failures. 
We would be happy to help:\n\n1. **Rebase the PR** onto the latest master to 
resolve conflicts\n2. **Fix CI failures** and ensure tests pass\n3. **Address 
any outstanding review feedback**\n\n### Questions\n\n1. Are there any blocking 
concerns or design decisions that prevented this from moving forward?\n2. Would 
it be acceptable for us to create a new PR based on this work with the 
necessary updates?\n3. Is there any overlap or conflict with the RFC in #11593 
\\(Robust Spark Writes\\)?\n\nWe have been running a similar implementation 
internally and can validate that the approach works well in production. Looking 
forward to helping get this merged.\n\ncc: @nsivabalan @codope\nEOF\n\\)\")",

Review Comment:
   should this change be excluded?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to