Helo everyone,

I have a question about relative-path resolution in the context of
multi-region replication.

*Context:* We have a use case where data files may reside in different
storage locations depending on the replication state. To resolve a relative
path, we'd need additional context (e.g., the commit's sequence-id) to
determine which region/scheme a given file should resolve to.

We are actually thinking about swapping the absolute path scheme while we
wait for relative-path support.  We plan to do this at the FileIO layer
when requesting new input files.

The problem we've got on doing the swap at the FileIO is that there are raw
string path calls without not context to do any routing decision.  I would
expect the same problem to occur here for relative-paths where there isn't
enough context to determine the scheme.  The same argument can be made that
we require even more metadata to support more complicated use-cases, such
as sequence-id (and/or data-sequence-id) .


*Questions:*

   1.

   What is the intended use case for relative paths in Iceberg? Is it
   designed primarily for DR/replication scenarios?  What about real-time
   replication?
   2. At what point can a manifest or data file's relative path be resolved
   to an absolute path? Does the current design assume all referenced data is
   already available locally?
   3. In FileIO, newInputFile(String path) takes a raw path string. Is
   there a planned mechanism to provide additional metadata (like sequence
   context) to help resolve paths in more complex topologies?

We'd like to understand Iceberg's direction on relative-path resolution so
we can align our approach with the community rather than diverging.


Thanks,
Sam

Reply via email to