jordepic opened a new pull request, #16243: URL: https://github.com/apache/iceberg/pull/16243
DynamicSinkUtil.getEqualityFieldIds and DynamicWriter.getEqualityFields both fall back to the schema's identifierFieldIds when the user-supplied equality fields are empty, but two routing decisions in the dynamic sink ignored that fallback: 1. HashKeyGenerator distributed identifier-only records round-robin, so two rows sharing an identifier-derived key could land on different writer subtasks while the writer still emitted equality deletes keyed by those identifier fields - breaking equality-delete correctness. 2. DynamicRecordProcessor forwarded any record with a null distributionMode straight to the writer, even when the record resolved to a non-empty equality-field set. Forward-mode records sharing an equality key could likewise split across writers and leave duplicates behind. Centralize the resolution in DynamicSinkUtil.resolveEqualityFieldNames and use it in both call sites so distribution and write-side equality-field inference stay aligned. Document the carve-out in flink-writes.md and add unit tests covering both paths across v2.1, v2.0 and v1.20. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
