tanmayrauth commented on issue #1096:
URL: https://github.com/apache/iceberg-go/issues/1096#issuecomment-4614313815
Thanks for offering, @C-Loftus. @laskoviymishka / @zeroshade can help you
get assigned this issue.
Before any code lands I'd want to nail down a few things, because the
issue title is broader than what's safe to ship:
1. Atomic rename / commit semantics. The current implementation relies on
os.Rename in three load-bearing spots — version-hint write, namespace metadata
write, and CommitTable (catalog/hadoop/hadoop.go ). That rename is the
catalog's only mechanism for safe concurrent commits. S3 has no atomic rename
(it's copy + delete); GCS and ABFS have different consistency stories than
POSIX. The Java HadoopCatalog carries the same limitation and explicitly
discourages object-store usage. Wiring io.IO makes the call sites cloud-capable
but does not make commits safe — we either need a per-backend conditional-write
story (S3 If-Match, GCS generation preconditions, etc.) or an explicit "unsafe
for concurrent writers" caveat surfaced on CreateCatalog and in docs. A written
plan for this is the part I'd push hardest on.
2. Interface surface. io.IO today is read/write/delete-oriented. The
Hadoop catalog also walks metadata/ to discover vN.metadata.json and stats
files for existence checks, so listing + stat capabilities (ListableIO,
equivalent for stat) need to be in scope. Worth deciding up front whether you
extend io.IO or compose interfaces.
3. Relative paths are a separate concern. Switching to io.IO is necessary
but not sufficient for relative-path support — that's a contract decision about
how the IO resolves identifiers, not catalog-side code. I'd suggest keeping the
cloud-IO change pure and tracking relative-path support as a follow-up.
Given the rename problem, I'd suggest splitting this into two PRs: first
the pure io.IO plumbing (read path, listing, stat) keeping local-fs as the only
supported writer, then a separate change that introduces backend-specific
commit primitives. That way the abstraction lands cleanly and the harder
correctness work doesn't block it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]