mbutrovich commented on PR #15470: URL: https://github.com/apache/iceberg/pull/15470#issuecomment-4443218660
Thanks @nastra and @huaxingao for the review. A few questions before I push fixes. **On API stability (@nastra)** I can add a deprecated 9-param `rewriteDeleteManifest` that delegates to main's existing behavior. The wrapper would still produce manifests with incorrect `file_size_in_bytes` for position delete files, since it has no `PositionDeleteReaderWriter`. I'd add a javadoc warning pointing to the new overload. Does that work, or would you prefer a different pattern? For `rewritePositionDeleteFile`, the return type change from `void` to `long` can't be deprecated cleanly because Java doesn't allow two methods that differ only in return type. Options: a. Keep the public method `void` and add a package-private helper that returns the size for internal use. b. Rename the new public method (e.g. `rewritePositionDeleteFileAndReturnSize`) and keep the old one as deprecated. c. Treat this as a justified break since the only in-tree caller is core itself. Any preference? **On the HDFS in-progress write race (@huaxingao)** You're right that the `getLength()` fallback on a sibling task's in-progress staging file has the same bug class. Approaches I'm considering: a. Per-task UUID staging path. Each task writes to `<staging>/<task-uuid>/<relative>`. No staging collision, size always from our own writer. Downside: the copy plan ends up with multiple `(source, target)` pairs that share a target, pushing the collision to the copy phase. b. Keep the deterministic shared staging path. On `FileAlreadyExistsException`, rewrite to a per-task scratch path purely to get `writer.length()`, then delete the scratch. Single canonical staging file, single copy plan entry. Cost is duplicate rewrite work only on collision. Either works, would appreciate your preference. **On the revapi.yml diff (@huaxingao)** If we land deprecated wrappers, the set of new entries should drop to zero or near-zero and the diff issue resolves itself. I'll restore the original 1.10.0 block exactly as in main once the API decisions are made. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
