adp2201 commented on PR #14797: URL: https://github.com/apache/iceberg/pull/14797#issuecomment-4049987731
Thanks for continuing this work — the implementation and discussion here are very helpful. Given the mixed reports (works for some setups, occasional duplicates for others), could we tighten the merge criteria around a clear correctness contract before merge? Specifically, it would help to have: 1. A documented behavior matrix for CDC/upsert mode (DV path vs equality-delete fallback, MOR/COW expectations, partitioned vs unpartitioned), 2. A deterministic integration test (or test matrix) that reproduces rapid consecutive updates to the same key across commit boundaries and validates no duplicate live rows, 3. Explicit operational requirements in docs (required table props, compaction cadence, non-null identifier constraints, and known limitations). That would make it much easier for users to adopt safely and for maintainers to evaluate long-term support risk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
