Thanks all,

  Following the relative path discussion last week, I want to raise a
question about lifecycle clean up operations in the context of table
location mutability.
The current proposal established that "*the table location is the basis for
all path resolution against persisted relative paths*". Since location
remains mutable, this creates a behavioral difference between v3 and v4
tables that increases operational complexity. Here's a concrete example:.

*Scenario*















*CREATE TABLE prod.db.events (  event_id BIGINT,  event_time TIMESTAMP,
payload STRING) USING icebergLOCATION 's3://bucket-a/warehouse/events';--
Insert some dataINSERT INTO prod.db.events VALUES (1, current_timestamp(),
'data1');INSERT INTO prod.db.events VALUES (2, current_timestamp(),
'data2');-- User changes location (Spark)ALTER TABLE prod.db.events SET
location 's3://bucket-b/warehouse/events';-- Write new dataINSERT INTO
prod.db.events VALUES (3, current_timestamp(), 'data3');*

*Result for v3 table on absolute path *
Manifest entries:
  - s3://bucket-a/warehouse/events/data/file1.parquet  (absolute - old
location)
  - s3://bucket-a/warehouse/events/data/file2.parquet  (absolute - old
location)
  - s3://bucket-b/warehouse/events/data/file3.parquet  (absolute - new
location)
Reads work out of the box as path are absolute
Snapshot expiration will cover both locations before and after the change
as iceberg metadata tracks the path at the time of creation
Orphan removal is limited as it will only respect only the latest location

*Result for v4 table on relative path*
Manifest entries:
  - file1.parquet  (relative - written when location was bucket-a)
  - file2.parquet  (relative - written when location was bucket-a)
  - file3.parquet  (relative - written when location is bucket-b)
Path resolution for file1.parquet:
  Resolved: s3://bucket-b/warehouse/events/data/file1.parquet  ❌
  Actual:   s3://bucket-a/warehouse/events/data/file1.parquet
Reads will fail after location change unless files are physically moved
(either by catalog or by background process)
Snapshot expiration and orphan removal will not cover locations before the
update.

*Question*
In v1-3, updating location is a lightweight, metadata-only operation which
only impacts future writes, and existing absolute paths continue to resolve
correctly for read. In v4, this is no longer the case. A location update
becomes a breaking change that requires physical file movement to maintain
correctness. From what I can tell, a catalog can either validate and handle
the movement, rewrite paths to absolute, or reject the update to make
location effectively immutable. Understandably, the iceberg spec does not
want to prescribe the catalog guidance, but should we acknowledge this
behavior change and document the lifecycle cleanup implications? Would be
great if we can disucss further before the spec is finalized.

Thanks,
Steve Zhang



On Thu, Jan 29, 2026 at 5:48 PM Talat Uyarer via dev <[email protected]>
wrote:

> Hi All,
>
> We had a productive meeting today regarding the Relative Paths proposal.
>
> We've reached a general agreement on the approach. The changes will
> involve explicitly defining path terminology (such as "absolute location")
> and should be well-contained within a new section on Table Spec.
>
> The next step is to open a PR with the proposed changes, which may include
> knock-on effects for the REST specification, such as updates to register
> table and load table requests.
>
> If you'd like to access the meeting notes:
> https://docs.google.com/document/d/1t0RxrK-nsCT83zXeD66kmGx_TMU2X8_xfN1A_k6dCV0/edit?usp=sharing
>
> You can find the recording here:
> https://drive.google.com/file/d/11q65achM_3vCfaEVYsxmfAdbKQJb2drA/view?usp=sharing
>
> Thanks for everyone
>
> Talat
>
> On Fri, Aug 1, 2025 at 10:50 AM Wing Yew Poon <[email protected]>
> wrote:
>
>> Dan,
>> Thanks for the clarifications.
>> Looking forward to the sync.
>> - Wing Yew
>>
>>
>> On Fri, Aug 1, 2025 at 8:43 AM Daniel Weeks <[email protected]> wrote:
>>
>>> Hey Wing Yu
>>>
>>> I see that you have been updating the Google doc containing the proposal.
>>>
>>>
>>> That's correct, I've been working with Talat to update the doc based on
>>> feedback from the comments and first round of discussion we had on this
>>> topic.
>>>
>>> Looking through it now, as far as I can tell, the basic idea (from the
>>>> original proposal) of inferring the table location from the path to the
>>>> current metadata.json has not changed. Is my reading correct?
>>>
>>>
>>> So far, nothing has changed about table location inference, but we will
>>> probably be revisiting this with respect to other updates/clarifications.
>>> There are still a couple open comments related to this point, but it is one
>>> of the main goals.
>>>
>>> You have added clarification around how the path to the metadata is
>>>> constructed from table location (from which the table location is thus
>>>> reverse engineered) and around path relativization, but the original idea
>>>> does not appear to have changed. In that case, the use case of having a
>>>> single copy of metadata but more than one copy of data (two or more
>>>> locations) is not supported by the proposal. This was the sticking point in
>>>> the last sync to discuss the proposal.
>>>
>>>
>>> I don't believe this was the sticking point from the original
>>> discussion.  Having multiple copies/locations of the same data files under
>>> a single table's management is explicitly a non-goal.  It was discussed in
>>> the comments of the doc for caching/fallback use cases, but I think that's
>>> better handled by specific engine/fileio implementations.
>>>
>>> The main sticking points were confusion around the complexity of how
>>> paths are constructed/persisted and the interplay between
>>> table/metadata/data locations depending on how those values are set in the
>>> table metadata.  Based on that feedback, we're suggesting some changes,
>>> which is primarily consist of: 1) defining path construction, resolution,
>>> and relativization separately, 2) making all paths relative to the table
>>> location (which simplifies resolution/relativization, 3) address
>>> confusing/complex issues like path separators and expectations around
>>> separators.
>>>
>>> We're still in the process of updating the document, but we will
>>> schedule another sync to discuss these updates in detail and address a few
>>> points that are still outstanding.
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Thu, Jul 31, 2025 at 5:47 PM Wing Yew Poon
>>> <[email protected]> wrote:
>>>
>>>> Hi Daniel Weeks,
>>>> I see that you have been updating the Google doc containing the
>>>> proposal.
>>>> Looking through it now, as far as I can tell, the basic idea (from the
>>>> original proposal) of inferring the table location from the path to the
>>>> current metadata.json has not changed. Is my reading correct?
>>>> You have added clarification around how the path to the metadata is
>>>> constructed from table location (from which the table location is thus
>>>> reverse engineered) and around path relativization, but the original idea
>>>> does not appear to have changed. In that case, the use case of having a
>>>> single copy of metadata but more than one copy of data (two or more
>>>> locations) is not supported by the proposal. This was the sticking point in
>>>> the last sync to discuss the proposal.
>>>> Do you intend to have another sync to continue the discussion?
>>>> Thanks,
>>>> Wing Yew
>>>>
>>>>
>>>> On Thu, Jul 10, 2025 at 4:41 PM Anurag Mantripragada
>>>> <[email protected]> wrote:
>>>>
>>>>> Thanks Kevin, yes, I see the recording link too but don’t have access.
>>>>> I have requested access.
>>>>>
>>>>>
>>>>> ~ Anurag Mantripragada
>>>>>
>>>>>
>>>>> On Jul 10, 2025, at 2:43 PM, Kevin Liu <[email protected]> wrote:
>>>>>
>>>>> Yes it was recorded. Dan or Talat should have the recording. I see
>>>>> there's already a link for the recording associated with the gcal event 
>>>>> but
>>>>> I dont have access to it.
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>> On Thu, Jul 10, 2025 at 12:37 PM Anurag Mantripragada
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Hey folks, was the sync recorded? I missed it due to calendar sync
>>>>>> issues :(
>>>>>>
>>>>>>
>>>>>> ~ Anurag Mantripragada
>>>>>>
>>>>>> On Jul 7, 2025, at 6:27 PM, ally heev <[email protected]> wrote:
>>>>>>
>>>>>> Thanks. I can see it now
>>>>>>
>>>>>> On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> I can see the new event on the dev calendar.
>>>>>>> [image: Screenshot 2025-07-07 at 12.04.08 PM.png]
>>>>>>>
>>>>>>> Subscribe to the "Iceberg Dev Events" calendar here:
>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey Ally (and everyone else).
>>>>>>>>
>>>>>>>> We hadn't scheduled the discussion for relative paths, but I just
>>>>>>>> added an event to the dev calendar for Thursday at 9am (PT).
>>>>>>>>
>>>>>>>> Let me know if you still don't see it on the calendar.
>>>>>>>>
>>>>>>>> -Dan
>>>>>>>>
>>>>>>>> On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Talat
>>>>>>>>>
>>>>>>>>> Thanks for the update. I will do a new pass on the doc.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Wed, May 28, 2025 at 12:13 AM Talat Uyarer
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi, Iceberg Community,
>>>>>>>>> >
>>>>>>>>> > As mentioned at the last sync, Dan and I have been working on a
>>>>>>>>> proposal to add support for relative paths, which has been a long 
>>>>>>>>> requested
>>>>>>>>> feature. There have been a number of discussions/proposals over the 
>>>>>>>>> years,
>>>>>>>>> but we'd like to scope down and refocus effort to make some meaningful
>>>>>>>>> progress on this issue.
>>>>>>>>> >
>>>>>>>>> > Please take a look at the linked doc and provide feedback. We'd
>>>>>>>>> love to open up discussion on this topic at the next community sync 
>>>>>>>>> and we
>>>>>>>>> can hold one-off syncs on the topic if there's a lot of interest.
>>>>>>>>> >
>>>>>>>>> > You can access Iceberg's First V4 Spec change from here :)
>>>>>>>>> >
>>>>>>>>> > Proposal Issue: https://github.com/apache/iceberg/issues/13141
>>>>>>>>> > Doc: https://s.apache.org/iceberg-spec-relative-path
>>>>>>>>> >
>>>>>>>>> > Talat
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>

Reply via email to