Hey everyone,

I wanted to follow up before sending out a vote for the spec additions
regarding relative paths because the discussed proposal had a non-trivial
update.

Early in the proposal process we went back and forth on whether (and where)
to include separator characters.  After a few rounds of discussion we
settled on including the path separator in the relative portion of the path
and limiting references to that separator.

Anoop and Steven pointed out an issue when relativizing similar paths; for
example: 'table' vs 'table_v2' (more detail in this comment thread
<https://github.com/apache/iceberg/pull/16174#discussion_r3228742112>).
This also led us to reconsider other aspects of how we treat the
representation and separator character.  Ultimately, it's more clear in the
spec to explicitly call out handling of the separator character and join
the table location and relative path on the URI separator character, '/'.

This has a number of advantages in that it makes relativization
unambiguous, is more consistent with how people and other formats like
Delta, Lance, Paimon, and Hudi reference relative paths, and makes the
language in the spec around separators more clear.

Please take a look at the updated PR
<https://github.com/apache/iceberg/pull/15630/changes> if you have
questions/comments.

-Dan

On Mon, Apr 20, 2026 at 2:20 PM Daniel Weeks <[email protected]> wrote:

> Thanks to everyone who provided feedback.
>
> I've incorporated feedback from the first round and updated the PR.
>
> Please take a second (or first) look.
>
> -Dan
>
> On Mon, Mar 23, 2026 at 1:05 PM Daniel Weeks <[email protected]> wrote:
>
>> Hey everyone,
>>
>> If you're interested in the first round of spec related updates for
>> relative paths, please take a look and add comments:
>> https://github.com/apache/iceberg/pull/15630
>>
>> -Dan
>>
>> On Mon, Mar 23, 2026 at 1:04 PM Daniel Weeks <[email protected]> wrote:
>>
>>> Hey Steve,
>>>
>>> I'm not sure if you were able get an answer on this question in any of
>>> the follow up discussions we had on relative paths, but the situation you
>>> describe is inherent to the difference between absolute and relative paths.
>>>
>>> The spec isn't responsible for how you relocate/duplicate/etc data if
>>> the base component of the relative path is updated and is explicitly not
>>> covered by the design.  That's the responsibility of the catalog or
>>> implementation.
>>>
>>> If you want data persistence across metadata moves, you always have the
>>> ability to produce absolute paths to retain the v1-3 behavior.  However, I
>>> believe what we've learned through production deployments and in comparison
>>> to other formats, is that primary use case is to either relocate the entire
>>> dataset or duplicate the entire dataset, which is the basis for the
>>> relative path model described in the proposal.
>>>
>>> As to the catalog handling, most (all?) implementations either do not
>>> natively support rename (like HadoopCatalog) and others treat rename as a
>>> metadata only operation but do not change the table location.  The closest
>>> thing is probably register table in the REST catalog, but that is very much
>>> left up to the catalog implementation.  I think we can draw from this that
>>> most table relocations are being performed outside of the catalog an then
>>> registered in the catalog.
>>>
>>> -Dan
>>>
>>>
>>>
>>>
>>> On Wed, Feb 4, 2026 at 2:27 PM Steve <[email protected]> wrote:
>>>
>>>> Thanks all,
>>>>
>>>>   Following the relative path discussion last week, I want to raise a
>>>> question about lifecycle clean up operations in the context of table
>>>> location mutability.
>>>> The current proposal established that "*the table location is the
>>>> basis for all path resolution against persisted relative paths*".
>>>> Since location remains mutable, this creates a behavioral difference
>>>> between v3 and v4 tables that increases operational complexity. Here's a
>>>> concrete example:.
>>>>
>>>> *Scenario*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *CREATE TABLE prod.db.events (  event_id BIGINT,  event_time
>>>> TIMESTAMP,  payload STRING) USING icebergLOCATION
>>>> 's3://bucket-a/warehouse/events';-- Insert some dataINSERT INTO
>>>> prod.db.events VALUES (1, current_timestamp(), 'data1');INSERT INTO
>>>> prod.db.events VALUES (2, current_timestamp(), 'data2');-- User changes
>>>> location (Spark)ALTER TABLE prod.db.events SET location
>>>> 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO
>>>> prod.db.events VALUES (3, current_timestamp(), 'data3');*
>>>>
>>>> *Result for v3 table on absolute path *
>>>> Manifest entries:
>>>>   - s3://bucket-a/warehouse/events/data/file1.parquet  (absolute - old
>>>> location)
>>>>   - s3://bucket-a/warehouse/events/data/file2.parquet  (absolute - old
>>>> location)
>>>>   - s3://bucket-b/warehouse/events/data/file3.parquet  (absolute - new
>>>> location)
>>>> Reads work out of the box as path are absolute
>>>> Snapshot expiration will cover both locations before and after the
>>>> change as iceberg metadata tracks the path at the time of creation
>>>> Orphan removal is limited as it will only respect only the latest
>>>> location
>>>>
>>>> *Result for v4 table on relative path*
>>>> Manifest entries:
>>>>   - file1.parquet  (relative - written when location was bucket-a)
>>>>   - file2.parquet  (relative - written when location was bucket-a)
>>>>   - file3.parquet  (relative - written when location is bucket-b)
>>>> Path resolution for file1.parquet:
>>>>   Resolved: s3://bucket-b/warehouse/events/data/file1.parquet  ❌
>>>>   Actual:   s3://bucket-a/warehouse/events/data/file1.parquet
>>>> Reads will fail after location change unless files are physically moved
>>>> (either by catalog or by background process)
>>>> Snapshot expiration and orphan removal will not cover locations before
>>>> the update.
>>>>
>>>> *Question*
>>>> In v1-3, updating location is a lightweight, metadata-only operation
>>>> which only impacts future writes, and existing absolute paths continue to
>>>> resolve correctly for read. In v4, this is no longer the case. A location
>>>> update becomes a breaking change that requires physical file movement to
>>>> maintain correctness. From what I can tell, a catalog can either validate
>>>> and handle the movement, rewrite paths to absolute, or reject the update to
>>>> make location effectively immutable. Understandably, the iceberg spec does
>>>> not want to prescribe the catalog guidance, but should we acknowledge this
>>>> behavior change and document the lifecycle cleanup implications? Would be
>>>> great if we can disucss further before the spec is finalized.
>>>>
>>>> Thanks,
>>>> Steve Zhang
>>>>
>>>>
>>>>
>>>> On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We had a productive meeting today regarding the Relative Paths
>>>>> proposal.
>>>>>
>>>>> We've reached a general agreement on the approach. The changes will
>>>>> involve explicitly defining path terminology (such as "absolute location")
>>>>> and should be well-contained within a new section on Table Spec.
>>>>>
>>>>> The next step is to open a PR with the proposed changes, which may
>>>>> include knock-on effects for the REST specification, such as updates to
>>>>> register table and load table requests.
>>>>>
>>>>> If you'd like to access the meeting notes:
>>>>> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing
>>>>>
>>>>> You can find the recording here:
>>>>> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing
>>>>>
>>>>> Thanks for everyone
>>>>>
>>>>> Talat
>>>>>
>>>>> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Dan,
>>>>>> Thanks for the clarifications.
>>>>>> Looking forward to the sync.
>>>>>> - Wing Yew
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Wing Yu
>>>>>>>
>>>>>>> I see that you have been updating the Google doc containing the
>>>>>>>> proposal.
>>>>>>>
>>>>>>>
>>>>>>> That's correct, I've been working with Talat to update the doc based
>>>>>>> on feedback from the comments and first round of discussion we had on 
>>>>>>> this
>>>>>>> topic.
>>>>>>>
>>>>>>> Looking through it now, as far as I can tell, the basic idea (from
>>>>>>>> the original proposal) of inferring the table location from the path 
>>>>>>>> to the
>>>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>>>
>>>>>>>
>>>>>>> So far, nothing has changed about table location inference, but we
>>>>>>> will probably be revisiting this with respect to other
>>>>>>> updates/clarifications.  There are still a couple open comments related 
>>>>>>> to
>>>>>>> this point, but it is one of the main goals.
>>>>>>>
>>>>>>> You have added clarification around how the path to the metadata is
>>>>>>>> constructed from table location (from which the table location is thus
>>>>>>>> reverse engineered) and around path relativization, but the original 
>>>>>>>> idea
>>>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>>>> locations) is not supported by the proposal. This was the sticking 
>>>>>>>> point in
>>>>>>>> the last sync to discuss the proposal.
>>>>>>>
>>>>>>>
>>>>>>> I don't believe this was the sticking point from the original
>>>>>>> discussion.  Having multiple copies/locations of the same data files 
>>>>>>> under
>>>>>>> a single table's management is explicitly a non-goal.  It was discussed 
>>>>>>> in
>>>>>>> the comments of the doc for caching/fallback use cases, but I think 
>>>>>>> that's
>>>>>>> better handled by specific engine/fileio implementations.
>>>>>>>
>>>>>>> The main sticking points were confusion around the complexity of how
>>>>>>> paths are constructed/persisted and the interplay between
>>>>>>> table/metadata/data locations depending on how those values are set in 
>>>>>>> the
>>>>>>> table metadata.  Based on that feedback, we're suggesting some changes,
>>>>>>> which is primarily consist of: 1) defining path construction, 
>>>>>>> resolution,
>>>>>>> and relativization separately, 2) making all paths relative to the table
>>>>>>> location (which simplifies resolution/relativization, 3) address
>>>>>>> confusing/complex issues like path separators and expectations around
>>>>>>> separators.
>>>>>>>
>>>>>>> We're still in the process of updating the document, but we will
>>>>>>> schedule another sync to discuss these updates in detail and address a 
>>>>>>> few
>>>>>>> points that are still outstanding.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dan
>>>>>>>
>>>>>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Daniel Weeks,
>>>>>>>> I see that you have been updating the Google doc containing the
>>>>>>>> proposal.
>>>>>>>> Looking through it now, as far as I can tell, the basic idea (from
>>>>>>>> the original proposal) of inferring the table location from the path 
>>>>>>>> to the
>>>>>>>> current metadata.json has not changed. Is my reading correct?
>>>>>>>> You have added clarification around how the path to the metadata is
>>>>>>>> constructed from table location (from which the table location is thus
>>>>>>>> reverse engineered) and around path relativization, but the original 
>>>>>>>> idea
>>>>>>>> does not appear to have changed. In that case, the use case of having a
>>>>>>>> single copy of metadata but more than one copy of data (two or more
>>>>>>>> locations) is not supported by the proposal. This was the sticking 
>>>>>>>> point in
>>>>>>>> the last sync to discuss the proposal.
>>>>>>>> Do you intend to have another sync to continue the discussion?
>>>>>>>> Thanks,
>>>>>>>> Wing Yew
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Thanks Kevin, yes, I see the recording link too but don’t have
>>>>>>>>> access. I have requested access.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Yes it was recorded. Dan or Talat should have the recording. I see
>>>>>>>>> there's already a link for the recording associated with the gcal 
>>>>>>>>> event but
>>>>>>>>> I dont have access to it.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kevin Liu
>>>>>>>>>
>>>>>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey folks, was the sync recorded? I missed it due to calendar
>>>>>>>>>> sync issues :(
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>>
>>>>>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks. I can see it now
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I can see the new event on the dev calendar.
>>>>>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png]
>>>>>>>>>>>
>>>>>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here:
>>>>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Ally (and everyone else).
>>>>>>>>>>>>
>>>>>>>>>>>> We hadn't scheduled the discussion for relative paths, but I
>>>>>>>>>>>> just added an event to the dev calendar for Thursday at 9am (PT).
>>>>>>>>>>>>
>>>>>>>>>>>> Let me know if you still don't see it on the calendar.
>>>>>>>>>>>>
>>>>>>>>>>>> -Dan
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Talat
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the update. I will do a new pass on the doc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> JB
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Hi, Iceberg Community,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > As mentioned at the last sync, Dan and I have been working
>>>>>>>>>>>>> on a proposal to add support for relative paths, which has been a 
>>>>>>>>>>>>> long
>>>>>>>>>>>>> requested feature. There have been a number of 
>>>>>>>>>>>>> discussions/proposals over
>>>>>>>>>>>>> the years, but we'd like to scope down and refocus effort to make 
>>>>>>>>>>>>> some
>>>>>>>>>>>>> meaningful progress on this issue.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Please take a look at the linked doc and provide feedback.
>>>>>>>>>>>>> We'd love to open up discussion on this topic at the next 
>>>>>>>>>>>>> community sync
>>>>>>>>>>>>> and we can hold one-off syncs on the topic if there's a lot of 
>>>>>>>>>>>>> interest.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Proposal Issue:
>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/13141
>>>>>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Talat
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>

Reply via email to