Thanks Eduard for working on this. +1 on the approach. I also agreed with Amogh that both presigned URL and scoped credential vending should be supported, which isn't necessarily the scope of this PR.
Yufei On Thu, Nov 13, 2025 at 1:43 PM Steven Wu <[email protected]> wrote: > AFAIK, there is no bulk API to generate pre-signed urls. Need to generate > pre-signed urls one by one. Even with parallelization, it can still be slow > for larger server-side planning. > > Amogh has a valid concern on client integration. Is there PoC on how this > can be plumbed through at the client side in iceberg-core? > > On Thu, Nov 13, 2025 at 3:09 AM Amogh Jahagirdar <[email protected]> wrote: > >> I'm +1 on this, though I did want to bring up a point on also achieving >> this via the server sending back presigned URLs for the file locations. To >> be clear, I don't think these are mutually exclusive approaches and like I >> mentioned I'm +1 on a path for leveraging catalog vended storage >> credentials as done in this PR; I just wanted to think through the >> tradeoffs. >> >> I think the clearest benefit for the proposed approach is that many >> catalogs already have the mechanisms to vend credentials to clients, so >> this and the other change for refreshing credentials for a given plan is >> likely not a heavy lift for *servers *to achieve. I think the complexity >> will largely be on the client implementation in this approach, where we're >> going to have to work through some FileIO scoping challenges for a given >> plan. In the end, it's all doable but it is some level of complexity >> shifted to the client (handling the refreshing/scoping/any caching on top >> of that). >> >> Presigned URLs are supported by all the major object storage providers as >> far as I checked. Clients would have to change in order to distinguish >> between expected object storage URI structures and presigned URLs, but I >> think that overall the client side complexity for scoping is reduced >> compared to the credential vending approach. I think in this approach >> complexity is shifted to the server where the server needs to sign the >> objects. One could imagine at large scale of files, there's likely a lot of >> additional load on the server (CPU bound signing). Also later on, if >> there's desire to be able to extend the protocol to say "Hey read >> everything in this directory", then a scoped credential for that is >> desirable (required?). >> >> My TLDR analysis is that credential vending in scan planning is probably >> net better for larger scale scans, and is also a lighter lift for server >> implementations today while presigned URLs is probably better in terms of >> making it easy for a wide variety of clients to integrate. In the end, I >> don't think the 2 approaches are incompatible with each other and I don't >> see any one way doors so I think it's entirely reasonable to start with the >> proposed approach. Wonder what others think! >> >> Thanks, >> Amogh Jahagirdar >> >> >> >> On Wed, Nov 12, 2025 at 7:49 AM Eduard Tudenhöfner < >> [email protected]> wrote: >> >>> Hey everyone, >>> >>> For server-side scan planning we missed adding storage credentials, >>> hence I'm proposing to add them to the response of the */plan* endpoint. >>> >>> The OpenAPI changes can be seen in PR #14563 >>> <https://github.com/apache/iceberg/pull/14563>. >>> >>> Looking forward to your thoughts and feedback. >>> >>> Thanks, >>> Eduard >>> >>
