Re: Relative paths and location resolution

samuel pacheco cantu via dev Tue, 02 Jun 2026 08:51:24 -0700

To give more details, yes,  the idea is that at the moment of staging a
commit, we can create a directory under warehouse_location/commit_id/ that
contains all data and metadata for that commit. Once committed,  we can
start replicating that directory.  Each commit_id would map to a
sequence-id.


With this design, it's possible to partially replicate commits across
multiple regions by using a bitmap to determine if sequence-id N has
already been replicated to region X.

Regarding the writer path construct,  yeah I see the LocationProvider is in
charge of creating the path. My question is about the FileIO input files:
does it make sense to keep the metadata as relative paths (or absolute
paths), and simply swap the files we read when getting the input
files? This might be a hacky and inelegant solution.  I'm reaching out to
understand which layer will be in charge of resolving the paths.

The current relative path solution sounds like it would add checks within
the parsing code to see if the URI is relative, and then join the
table_location with the relative_path if applicable. I'm curious if, at the
very least, it would make sense to make it extendable by considering more
metadata.  For our use case, we are exploring the sequence ID as part of
the routing.

On Mon, Jun 1, 2026 at 5:22 PM Steven Wu <[email protected]> wrote:

> It seems that Sam wants a table to hold data files from multiple live
> regions (prefixes). The current design only supports a single prefix. On
> Mon, Jun 1, 2026 at 3: 19 PM Daniel Weeks <dweeks@ apache. org> wrote:
> Hey Sam, I'm not sure
> 
> It seems that Sam wants a table to hold data files from multiple live
> regions (prefixes). The current design only supports a single prefix.
>
> On Mon, Jun 1, 2026 at 3:19 PM Daniel Weeks <[email protected]> wrote:
>
>> Hey Sam,
>>
>> I'm not sure I fully understand the scenario you're describing, but
>> relative paths the basic concept is that you have a table location
>> (provided by a catalog) and files are resolved relative to that table
>> location.
>>
>> Some example are provided in the spec
>> <https://urldefense.com/v3/__https://iceberg.apache.org/spec/*path-resolution__;Iw!!Bt8RZUm9aw!5VJhaXv3HA50KqjLyUDUV7PikhxSsRDBSK3vP3lq783pVjyin1j9qtjPCNkQBJSm1a0xe0agLSyqe7M$>
>> .
>>
>>
>>    1.
>>
>>    What is the intended use case for relative paths in Iceberg? Is it
>>    designed primarily for DR/replication scenarios?  What about real-time
>>    replication?
>>
>> The design accommodates DR/replication with proper catalog
>> implementations to route or provide the table location.  The act of
>> replicating the files is left out of the spec, but can be realtime
>> depending on the implementation.
>>
>>    1. At what point can a manifest or data file's relative path be
>>    resolved to an absolute path? Does the current design assume all 
>> referenced
>>    data is already available locally?
>>
>> Paths are resolved when they're read out of manifests.  If you have a
>> reference in metadata to a file, it should exist or readers will fail when
>> fetching the file.  By the time you perform a commit operation, it must be
>> referenceable.
>>
>>    1. In FileIO, newInputFile(String path) takes a raw path string. Is
>>    there a planned mechanism to provide additional metadata (like sequence
>>    context) to help resolve paths in more complex topologies?
>>
>> A writer can construct paths in any way they want. Reference
>> implementation behaviors are described in the appendix section, but there's
>> no requirement for how they're constructed.  Relative path support is still
>> being added to the reference implementation, but path construction is
>> largely the responsibility of LocationProvider.  The path logic focuses on
>> resolving or relativizing paths, not constructing them.
>>
>> -Dan
>>
>>
>> On Mon, Jun 1, 2026 at 12:28 PM samuel pacheco cantu via dev <
>> [email protected]> wrote:
>>
>>> Helo everyone,
>>>
>>> I have a question about relative-path resolution in the context of
>>> multi-region replication.
>>>
>>> *Context:* We have a use case where data files may reside in different
>>> storage locations depending on the replication state. To resolve a relative
>>> path, we'd need additional context (e.g., the commit's sequence-id) to
>>> determine which region/scheme a given file should resolve to.
>>>
>>> We are actually thinking about swapping the absolute path scheme while
>>> we wait for relative-path support.  We plan to do this at the FileIO layer
>>> when requesting new input files.
>>>
>>> The problem we've got on doing the swap at the FileIO is that there are
>>> raw string path calls without not context to do any routing decision.  I
>>> would expect the same problem to occur here for relative-paths where there
>>> isn't enough context to determine the scheme.  The same argument can be
>>> made that we require even more metadata to support more complicated
>>> use-cases, such as sequence-id (and/or data-sequence-id) .
>>>
>>>
>>> *Questions:*
>>>
>>>    1.
>>>
>>>    What is the intended use case for relative paths in Iceberg? Is it
>>>    designed primarily for DR/replication scenarios?  What about real-time
>>>    replication?
>>>    2. At what point can a manifest or data file's relative path be
>>>    resolved to an absolute path? Does the current design assume all 
>>> referenced
>>>    data is already available locally?
>>>    3. In FileIO, newInputFile(String path) takes a raw path string. Is
>>>    there a planned mechanism to provide additional metadata (like sequence
>>>    context) to help resolve paths in more complex topologies?
>>>
>>> We'd like to understand Iceberg's direction on relative-path resolution
>>> so we can align our approach with the community rather than diverging.
>>>
>>>
>>> Thanks,
>>> Sam
>>>
>>>

Re: Relative paths and location resolution

Reply via email to