Thanks Eduard for working on this. +1 on the approach. I also agreed with
Amogh that both presigned URL and scoped credential vending should be
supported, which isn't necessarily the scope of this PR.

Yufei


On Thu, Nov 13, 2025 at 1:43 PM Steven Wu <[email protected]> wrote:

> AFAIK, there is no bulk API to generate pre-signed urls. Need to generate
> pre-signed urls one by one. Even with parallelization, it can still be slow
> for larger server-side planning.
>
> Amogh has a valid concern on client integration. Is there PoC on how this
> can be plumbed through at the client side in iceberg-core?
>
> On Thu, Nov 13, 2025 at 3:09 AM Amogh Jahagirdar <[email protected]> wrote:
>
>> I'm +1 on this, though I did want to bring up a point on also achieving
>> this via the server sending back presigned URLs for the file locations. To
>> be clear, I don't think these are mutually exclusive approaches and like I
>> mentioned I'm +1 on a path for leveraging catalog vended storage
>> credentials as done in this PR; I just wanted to think through the
>> tradeoffs.
>>
>> I think the clearest benefit for the proposed approach is that many
>> catalogs already have the mechanisms to vend credentials to clients, so
>> this and the other change for refreshing credentials for a given plan is
>> likely not a heavy lift for *servers *to achieve. I think the complexity
>> will largely be on the client implementation in this approach, where we're
>> going to have to work through some FileIO scoping challenges for a given
>> plan. In the end, it's all doable but it is some level of complexity
>> shifted to the client (handling the refreshing/scoping/any caching on top
>> of that).
>>
>> Presigned URLs are supported by all the major object storage providers as
>> far as I checked. Clients would have to change in order to distinguish
>> between expected object storage URI structures and presigned URLs, but I
>> think that overall the client side complexity for scoping is reduced
>> compared to the credential vending approach. I think in this approach
>> complexity is shifted to the server where the server needs to sign the
>> objects. One could imagine at large scale of files, there's likely a lot of
>> additional load on the server (CPU bound signing). Also later on, if
>> there's desire to be able to extend the protocol to say "Hey read
>> everything in this directory", then a scoped credential for that is
>> desirable (required?).
>>
>> My TLDR analysis is that credential vending in scan planning is probably
>> net better for larger scale scans, and is also a lighter lift for server
>> implementations today while presigned URLs is probably better in terms of
>> making it easy for a wide variety of clients to integrate. In the end, I
>> don't think the 2 approaches are incompatible with each other and I don't
>> see any one way doors so I think it's entirely reasonable to start with the
>> proposed approach. Wonder what others think!
>>
>> Thanks,
>> Amogh Jahagirdar
>>
>>
>>
>> On Wed, Nov 12, 2025 at 7:49 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> For server-side scan planning we missed adding storage credentials,
>>> hence I'm proposing to add them to the response of the */plan* endpoint.
>>>
>>> The OpenAPI changes can be seen in PR #14563
>>> <https://github.com/apache/iceberg/pull/14563>.
>>>
>>> Looking forward to your thoughts and feedback.
>>>
>>> Thanks,
>>> Eduard
>>>
>>

Reply via email to