tanmayrauth commented on issue #1096:
URL: https://github.com/apache/iceberg-go/issues/1096#issuecomment-4614313815

   Thanks for offering, @C-Loftus. @laskoviymishka / @zeroshade  can help you 
get assigned this issue. 
     
     Before any code lands I'd want to nail down a few things, because the 
issue title is broader than what's safe to ship:
   
     1. Atomic rename / commit semantics. The current implementation relies on 
os.Rename in three load-bearing spots — version-hint write, namespace metadata 
write, and CommitTable (catalog/hadoop/hadoop.go ). That rename is the 
catalog's only mechanism for safe concurrent commits. S3 has no atomic rename 
(it's copy + delete); GCS and ABFS have different consistency stories than 
POSIX. The Java HadoopCatalog carries the same limitation and explicitly 
discourages object-store usage. Wiring io.IO makes the call sites cloud-capable 
but does not make commits safe — we either need a per-backend conditional-write 
story (S3 If-Match, GCS generation preconditions, etc.) or an explicit "unsafe 
for concurrent writers" caveat surfaced on CreateCatalog and in docs. A written 
plan for this is the part I'd push hardest on.
     2. Interface surface. io.IO today is read/write/delete-oriented. The 
Hadoop catalog also walks metadata/ to discover vN.metadata.json and stats 
files for existence checks, so listing + stat capabilities (ListableIO, 
equivalent for stat) need to be in scope. Worth deciding up front whether you 
extend io.IO or compose interfaces.
     3. Relative paths are a separate concern. Switching to io.IO is necessary 
but not sufficient for relative-path support — that's a contract decision about 
how the IO resolves identifiers, not catalog-side code. I'd suggest keeping the 
cloud-IO change pure and tracking relative-path support as a follow-up.
   
     Given the rename problem, I'd suggest splitting this into two PRs: first 
the pure io.IO plumbing (read path, listing, stat) keeping local-fs as the only 
supported writer, then a separate change that  introduces backend-specific 
commit primitives. That way the abstraction lands cleanly and the harder 
correctness work doesn't block it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to