This sounds like a slightly different problem to me. Sharing more context, the rewrite_table_path procedure[1] already has machinery to reason about the delta between snapshots and prepare a file copy plan for table replication. That seems closer to the granularity Samuel is asking for than path resolution itself.
I wonder if the better solution is to have an additional replication tool that tracks which snapshots or sequence ranges have been copied to which locations, similar in spirit to rewrite_table_path. That tool could own the replication state and produce copy plans, or region-specific table locations. Then relative path resolution can remain simple, while more advanced replication logic stays outside the core FileIO path resolution layer. 1. https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_table_path Yufei On Tue, Jun 2, 2026 at 8:51 AM samuel pacheco cantu via dev < [email protected]> wrote: > To give more details, yes, the idea is that at the moment of staging a > commit, we can create a directory under warehouse_location/commit_id/ that > contains all data and metadata for that commit. Once committed, we can > start replicating that directory. Each commit_id would map to a > sequence-id. > > With this design, it's possible to partially replicate commits across > multiple regions by using a bitmap to determine if sequence-id N has > already been replicated to region X. > > Regarding the writer path construct, yeah I see the LocationProvider is > in charge of creating the path. My question is about the FileIO input > files: does it make sense to keep the metadata as relative paths (or > absolute paths), and simply swap the files we read when getting the input > files? This might be a hacky and inelegant solution. I'm reaching out to > understand which layer will be in charge of resolving the paths. > > The current relative path solution sounds like it would add checks within > the parsing code to see if the URI is relative, and then join the > table_location with the relative_path if applicable. I'm curious if, at the > very least, it would make sense to make it extendable by considering more > metadata. For our use case, we are exploring the sequence ID as part of > the routing. > > On Mon, Jun 1, 2026 at 5:22 PM Steven Wu <[email protected]> wrote: > >> It seems that Sam wants a table to hold data files from multiple live >> regions (prefixes). The current design only supports a single prefix. On >> Mon, Jun 1, 2026 at 3: 19 PM Daniel Weeks <dweeks@ apache. org> wrote: >> Hey Sam, I'm not sure >> It seems that Sam wants a table to hold data files from multiple live >> regions (prefixes). The current design only supports a single prefix. >> >> On Mon, Jun 1, 2026 at 3:19 PM Daniel Weeks <[email protected]> wrote: >> >>> Hey Sam, >>> >>> I'm not sure I fully understand the scenario you're describing, but >>> relative paths the basic concept is that you have a table location >>> (provided by a catalog) and files are resolved relative to that table >>> location. >>> >>> Some example are provided in the spec >>> <https://urldefense.com/v3/__https://iceberg.apache.org/spec/*path-resolution__;Iw!!Bt8RZUm9aw!5VJhaXv3HA50KqjLyUDUV7PikhxSsRDBSK3vP3lq783pVjyin1j9qtjPCNkQBJSm1a0xe0agLSyqe7M$> >>> . >>> >>> >>> 1. >>> >>> What is the intended use case for relative paths in Iceberg? Is it >>> designed primarily for DR/replication scenarios? What about real-time >>> replication? >>> >>> The design accommodates DR/replication with proper catalog >>> implementations to route or provide the table location. The act of >>> replicating the files is left out of the spec, but can be realtime >>> depending on the implementation. >>> >>> 1. At what point can a manifest or data file's relative path be >>> resolved to an absolute path? Does the current design assume all >>> referenced >>> data is already available locally? >>> >>> Paths are resolved when they're read out of manifests. If you have a >>> reference in metadata to a file, it should exist or readers will fail when >>> fetching the file. By the time you perform a commit operation, it must be >>> referenceable. >>> >>> 1. In FileIO, newInputFile(String path) takes a raw path string. Is >>> there a planned mechanism to provide additional metadata (like sequence >>> context) to help resolve paths in more complex topologies? >>> >>> A writer can construct paths in any way they want. Reference >>> implementation behaviors are described in the appendix section, but there's >>> no requirement for how they're constructed. Relative path support is still >>> being added to the reference implementation, but path construction is >>> largely the responsibility of LocationProvider. The path logic focuses on >>> resolving or relativizing paths, not constructing them. >>> >>> -Dan >>> >>> >>> On Mon, Jun 1, 2026 at 12:28 PM samuel pacheco cantu via dev < >>> [email protected]> wrote: >>> >>>> Helo everyone, >>>> >>>> I have a question about relative-path resolution in the context of >>>> multi-region replication. >>>> >>>> *Context:* We have a use case where data files may reside in different >>>> storage locations depending on the replication state. To resolve a relative >>>> path, we'd need additional context (e.g., the commit's sequence-id) to >>>> determine which region/scheme a given file should resolve to. >>>> >>>> We are actually thinking about swapping the absolute path scheme while >>>> we wait for relative-path support. We plan to do this at the FileIO layer >>>> when requesting new input files. >>>> >>>> The problem we've got on doing the swap at the FileIO is that there are >>>> raw string path calls without not context to do any routing decision. I >>>> would expect the same problem to occur here for relative-paths where there >>>> isn't enough context to determine the scheme. The same argument can be >>>> made that we require even more metadata to support more complicated >>>> use-cases, such as sequence-id (and/or data-sequence-id) . >>>> >>>> >>>> *Questions:* >>>> >>>> 1. >>>> >>>> What is the intended use case for relative paths in Iceberg? Is it >>>> designed primarily for DR/replication scenarios? What about real-time >>>> replication? >>>> 2. At what point can a manifest or data file's relative path be >>>> resolved to an absolute path? Does the current design assume all >>>> referenced >>>> data is already available locally? >>>> 3. In FileIO, newInputFile(String path) takes a raw path string. Is >>>> there a planned mechanism to provide additional metadata (like sequence >>>> context) to help resolve paths in more complex topologies? >>>> >>>> We'd like to understand Iceberg's direction on relative-path resolution >>>> so we can align our approach with the community rather than diverging. >>>> >>>> >>>> Thanks, >>>> Sam >>>> >>>>
