Hi all, just to clarify: we're not inventing anything new here, but will provide a way to use Iceberg's out-of-the-box ability to let the IRC (Polaris) sign the individual S3 requests. We're not changing anything in that remote S3-request-signing flow, and certainly cannot "magically" add optimizations.
Robert On Tue, Aug 19, 2025 at 5:08 PM Alexandre Dutra <adu...@apache.org> wrote: > > Hi Yufei, > > > Can we add sequence diagrams (client -> Polaris -> S3 compatible storage) > > to show the request life cycle? > > Of course! Added. > > > Is there any client/server caching we can add? > > Iceberg's S3V4RestSignerClient already has a cache. Do you have > something else in mind? > > > Are we considering batch signing? > > No, as Iceberg's S3V4RestSignerClient doesn't handle batch signing. > How do you envision batch signing? > > > Should we add rate limiting to avoid endpoint abuse? > > Polaris is already rate-limited, is there anything else you need? > > > How do clients like Spark, Trino discover and use the signing APIs? > > Through the LoadTableResult contents. > > Thanks, > Alex > > On Tue, Aug 19, 2025 at 9:04 AM Yufei Gu <flyrain...@gmail.com> wrote: > > > > Hi Alex, > > > > Thanks for drafting the S3 Remote Signing design. Overall it’s a good > > start, but it currently reads more like an implementation note than a > > design doc. To make it complete, could you expand on a few key areas? > > > > - Can we add sequence diagrams (client -> Polaris -> S3 compatible > > storage) to show the request life cycle? > > - Can we expand performance design? for example, > > - Is there any client/server caching we can add? > > - Are we considering batch signing? > > - Should we add rate limiting to avoid endpoint abuse? > > - How do clients like Spark, Trino discover and use the signing APIs? > > > > Yufei > > > > > > On Mon, Aug 18, 2025 at 10:22 AM Robert Stupp <sn...@snazy.de> wrote: > > > > > --flame off-- > > > S3 request signing is still a form of credential vending ;) > > > --flame on-- > > > > > > On Mon, Aug 18, 2025 at 6:41 PM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > > > > Thanks for starting an S3 signing doc Alex! > > > > > > > > Just a bit of a nitpicking comment: It looks like this thread was > > > hijacked > > > > for the S3 remote signing discussion :) This is fine from my POV, I just > > > > wanted to clarify that this thread was started to discuss options for > > > > Polaris to send specific credentials to clients (a.k.a. vended > > > credentials) > > > > when STS is not available. > > > > > > > > For the sake of clarity, let's re-title the thread for replies related > > > > to > > > > the remote signing discussion. > > > > > > > > Cheers, > > > > Dmitri. > > > > > > > > On Mon, Aug 18, 2025 at 11:54 AM Alexandre Dutra <adu...@apache.org> > > > wrote: > > > > > > > > > Hi Yufei, > > > > > > > > > > Yes, sure! There you go: > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1ygdia7u4bUHUt6n8XhZo48aKoIyyrCvKqan3XP25iB8/edit?usp=sharing > > > > > > > > > > Thanks, > > > > > Alex > > > > > > > > > > > > > > > On Thu, Aug 14, 2025 at 11:10 PM Yufei Gu <flyrain...@gmail.com> > > > wrote: > > > > > > > > > > > > Thanks Robert, Alex for working on this. Thanks Prashant for chiming > > > in. > > > > > > This is a big feature deserving a design doc and community > > > discussion. > > > > > Can > > > > > > we have a design doc first? > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > On Thu, Aug 14, 2025 at 8:53 AM Alexandre Dutra <adu...@apache.org> > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I've drafted an initial version of remote signing enablement in > > > > > > > Polaris [1]. Your comments are welcome, either here or directly on > > > the > > > > > > > PR, where there's already some valuable discussion. > > > > > > > > > > > > > > This PR aims to be a minimum viable product for remote signing, > > > not a > > > > > > > comprehensive implementation. Notably, it doesn't include Nessie's > > > > > > > cryptographically-signed request parameters. > > > > > > > > > > > > > > One aspect of remote signing not covered by the IRC specification > > > is > > > > > > > RBAC. For this, I've introduced a new table privilege and > > > authorizable > > > > > > > operation in the PR, with access checks based on these table-like > > > > > > > validations. This is admittedly coarse-grained, but can be refined > > > > > > > later. > > > > > > > > > > > > > > A consequence of implementing RBAC for remote signing is that it's > > > > > > > impractical to use the spec's default endpoint – /v1/aws/s3/sign – > > > > > > > because it cannot properly identify the table and catalog. > > > > > > > > > > > > > > Thanks, > > > > > > > Alex > > > > > > > > > > > > > > [1]: https://github.com/apache/polaris/pull/2280 > > > > > > > > > > > > > > On Thu, Aug 14, 2025 at 5:40 PM Prashant Singh > > > > > > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > > > > > > > > > > > > IMHO encoding stuff in the url so that we can avoid reverse > > > lookup > > > > > is the > > > > > > > > right thing to do ! > > > > > > > > Since we are relying on this, signing by a key that the catalog > > > owns > > > > > > > seems > > > > > > > > a logical natural step to avoid tampering. > > > > > > > > Nevertheless it's a standard practice which S3 has that gives > > > > > > > > you > > > > > > > signature > > > > > > > > in the pre-signed url ( > > > https://amzn-s3-demo-bucket.s3.amazonaws.com/ > > > > > > > > object.txt?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature= > > > > > > > > vjbyNxybdZaMmLa%2ByT372YEAiv4%3D&Expires=1741978496) Looking > > > forward > > > > > to > > > > > > > the > > > > > > > > design doc / proposal for Polaris. Best, Prashant Singh > > > > > > > > > > > > > > > > On Tue, Aug 5, 2025 at 6:23 AM Robert Stupp <sn...@snazy.de> > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I can contribute what we did in Nessie: > > > > > > > > > > > > > > > > > > S3 request signing requires one additional request against the > > > > > catalog > > > > > > > > > for each request performed by S3 (HTTP/REST here). The catalog > > > has > > > > > to > > > > > > > > > enforce the access rules (allow-listing, allowed read & write > > > > > > > > > locations). > > > > > > > > > Doing the access privilege "dance" considering the huge amount > > > of > > > > > > > > > requests is quite expensive, those S3 signing requests have to > > > be > > > > > as > > > > > > > > > fast as possible at best without any backend access, allowing > > > the > > > > > > > > > catalog to make a secure decision whether a particular request > > > is > > > > > > > > > allowed. > > > > > > > > > We have to keep in mind that a single loadTable() can easily > > > lead > > > > > to > > > > > > > > > thousands of S3 requests, and each requires its individual > > > > > signature. > > > > > > > > > > > > > > > > > > So how can that be done? As the catalog still has to perform > > > checks > > > > > > > > > against the above mentioned access rules, it has to know > > > those. We > > > > > can > > > > > > > > > pass the (encoded) access rules and an expiration timestamp in > > > the > > > > > > > > > catalog's request signing URL. We "just" have to ensure that > > > > > clients > > > > > > > > > cannot tamper the access rules, which is where cryptographic > > > > > signing > > > > > > > > > comes into play. > > > > > > > > > > > > > > > > > > When a client performs a "loadTable()" to get the S3 request > > > > > signing > > > > > > > > > URL, the catalog collects the access rules and encodes them in > > > a > > > > > > > > > serialized structure and signs it with a secret key that's > > > > > > > > > only > > > > > known > > > > > > > > > by the catalog. > > > > > > > > > > > > > > > > > > client: loadTable() > > > > > > > > > ---> catalog identifies the table > > > > > > > > > ---> catalog performs authZ checks > > > > > > > > > ---> catalog collects access rules > > > > > > > > > ---> catalog serializes access rules > > > > > > > > > ---> catalog signs serialized object > > > > > > > > > ---> catalog returns S3 signing endpoint > > > > > > > > > Such an S3 signing endpoint may look like this > > > > > > > > > ---> > > > > > > > > > > > > > > > > > > > > > > > > https://my-polaris.local/s3-signing/v1/sign/aGVsbG9wb2V3ZmtvcGV3a29wazMybzRpb3VoMjNpdXJoaXVoNGlwdWhqcGl1Z2pyb2lnam9pZWpnb3BpNGppb3B1Z2pocGl1aGdpdXAzNGhnaXVlcmhpdXBnaHJlaXB1Z2h1aXBoaXB1MmhiM3JpdWJuMzJpdXJ0bgo= > > > > > > > > > > > > > > > > > > When the catalog receives a signing request, it verifies the > > > > > signature > > > > > > > > > [1] and validates [2] the S3 request against those rules. This > > > > > happens > > > > > > > > > in Nessie without any database access, so each S3 signing > > > request > > > > > > > > > executes very quickly. > > > > > > > > > > > > > > > > > > The trick is to manage the secret keys. This is where the > > > > > > > > > signing-keys-service [3] comes into play. This service ensures > > > that > > > > > > > > > all Nessie instances have a secret key for signing purposes > > > > > > > > > and > > > > > have > > > > > > > > > access to the keys that have been used before, to enable > > > automatic > > > > > key > > > > > > > > > rotation. > > > > > > > > > > > > > > > > > > There is no knob that a user has to tune or set, it's a > > > standard > > > > > > > > > functionality in Nessie. And it works for all Nessie instances > > > > > (pods) > > > > > > > > > accessing the same backend. > > > > > > > > > > > > > > > > > > We can certainly contribute this functionality, which already > > > > > works in > > > > > > > > > many production environments, to Polaris. > > > > > > > > > > > > > > > > > > Robert > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/projectnessie/nessie/blob/17ab7e5f58bf8e8e62d3bafe8c7f97378f28fe12/catalog/service/rest/src/main/java/org/projectnessie/catalog/service/rest/IcebergApiV1S3SignResource.java#L104-L106 > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/projectnessie/nessie/blob/17ab7e5f58bf8e8e62d3bafe8c7f97378f28fe12/catalog/service/rest/src/main/java/org/projectnessie/catalog/service/rest/IcebergS3SignParams.java#L118 > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/projectnessie/nessie/blob/17ab7e5f58bf8e8e62d3bafe8c7f97378f28fe12/catalog/service/impl/src/main/java/org/projectnessie/catalog/service/impl/SignerKeysServiceImpl.java#L46 > > > > > > > > > > > > > > > > > > On Tue, Aug 5, 2025 at 6:04 AM Yufei Gu <flyrain...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Pat, > > > > > > > > > > > > > > > > > > > > Remote signing sounds a good idea! Looking forward to a > > > > > > > proposal/design > > > > > > > > > doc. > > > > > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 1, 2025 at 8:44 AM Pat Patterson > > > > > > > <p...@backblaze.com.invalid> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I'm Pat Patterson, Chief Technical Evangelist at > > > > > > > > > > > Backblaze. > > > > > I've > > > > > > > > > > > been working with Backblaze B2, our S3-compatible cloud > > > object > > > > > > > store, > > > > > > > > > and > > > > > > > > > > > Iceberg for a little while now, showing how to use it from > > > > > > > Snowflake, > > > > > > > > > > > Trino, DuckDB, etc. > > > > > > > > > > > > > > > > > > > > > > I'm replying here as requested by Dmitri on the "Support > > > for > > > > > > > non-AWS S3 > > > > > > > > > > > compatible storage with STS" GitHub issue [1]. I think S3 > > > > > signing > > > > > > > would > > > > > > > > > > > work well with Backblaze B2, since we don't currently have > > > an > > > > > STS. > > > > > > > I'm > > > > > > > > > > > happy to help in any way I can - I just left a reply to > > > > > Alexandre > > > > > > > > > Dutra on > > > > > > > > > > > the "On-Premise S3 & Remote Signing" GitHub issue [2]. > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://github.com/apache/polaris/issues/1530#issuecomment-3138005897 > > > > > > > > > > > [2] > > > > > > > > > > > > > > https://github.com/apache/polaris/issues/32#issuecomment-3144991873 > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > Pat > > > > > > > > > > > > > > > > > > > > > > On 2025/07/31 15:35:55 Robert Stupp wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > not sure whether exposing the object storage credentials > > > > > given to > > > > > > > > > > > > Polaris to all clients isn't going to cause a "false > > > > > impression > > > > > > > of > > > > > > > > > > > > security" (aka: "our credentials are vended by Polaris, > > > so > > > > > we're > > > > > > > > > safe" > > > > > > > > > > > > - nope...). > > > > > > > > > > > > With my "evil user" hat on, I'd try to figure out the > > > > > > > configuration > > > > > > > > > > > > option (is it realm-specific?) to tell Polaris to yield > > > its > > > > > > > "master" > > > > > > > > > > > > object storage credentials for a few seconds, just long > > > > > enough > > > > > > > so I > > > > > > > > > > > > can gain access to it and have access to all the data. > > > > > > > > > > > > > > > > > > > > > > > > No doubt, there are S3 implementations (software and > > > > > appliances) > > > > > > > that > > > > > > > > > > > > do not support STS, which is admittedly not great. I can > > > > > imagine > > > > > > > that > > > > > > > > > > > > at least some appliance vendors and software > > > > > projects/products > > > > > > > will > > > > > > > > > > > > get STS. > > > > > > > > > > > > > > > > > > > > > > > > For the non-STS use cases, I think S3 signing is the way > > > to > > > > > go. > > > > > > > Sure, > > > > > > > > > > > > it requires one more request, but we can make those > > > requests > > > > > fast > > > > > > > > > (aka > > > > > > > > > > > > not require any persistence access) as we did in Nessie. > > > With > > > > > > > that we > > > > > > > > > > > > could still ensure that clients don't have access to > > > > > everything, > > > > > > > > > > > > respecting the object-storage level read/write/list > > > > > privileges. > > > > > > > > > > > > > > > > > > > > > > > > Another option is still to configure the object storage > > > > > > > credentials > > > > > > > > > at > > > > > > > > > > > > the clients. It's not great, but it's still an option. > > > > > Admins can > > > > > > > > > give > > > > > > > > > > > > each client individual credentials to reduce potential > > > risks, > > > > > > > being > > > > > > > > > > > > able to revoke access for individual clients, and/or > > > audit > > > > > those. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 31, 2025 at 2:51 AM Yufei Gu < > > > fl...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for raising this, Dmitri! > > > > > > > > > > > > > > > > > > > > > > > > > > For non-STS use cases, some users may be more > > > comfortable > > > > > > > without > > > > > > > > > > > > > credential vending. They could configure the storage > > > > > > > credentials > > > > > > > > > at the > > > > > > > > > > > > > engines side. Can we first confirm that vending raw > > > > > > > credentials are > > > > > > > > > > > really > > > > > > > > > > > > > users asking for? > > > > > > > > > > > > > > > > > > > > > > > > > > If that's the case, raw credential vending should be > > > > > > > > > > > > > at > > > > > least > > > > > > > > > optional, > > > > > > > > > > > > > which could be guarded by feature flags. > > > > > > > > > > > > > > > > > > > > > > > > > > And I didn't see much difference between option 1 and > > > > > option 2. > > > > > > > > > Both > > > > > > > > > > > > > provide raw credentials and need rotation. Either way > > > is > > > > > fine > > > > > > > with > > > > > > > > > me. > > > > > > > > > > > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 30, 2025 at 3:24 PM Dmitri Bourlatchkov < > > > > > > > > > di...@apache.org> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Recent conversations [1] [2] about non-AWS S3 > > > > > > > > > > > > > > storage > > > > > > > brought up > > > > > > > > > user > > > > > > > > > > > needs > > > > > > > > > > > > > > for operating with S3-compatible storage that does > > > not > > > > > have > > > > > > > STS. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Remote request signing can be used to support those > > > use > > > > > > > cases, > > > > > > > > > but it > > > > > > > > > > > is a > > > > > > > > > > > > > > considerable development effort to add to Polaris, > > > plus > > > > > it > > > > > > > has > > > > > > > > > > > different > > > > > > > > > > > > > > performance characteristics than vended credentials. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I propose two short-term options to support users of > > > > > non-STS > > > > > > > S3 > > > > > > > > > > > storage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) Add a configuration option to vend the same > > > > > credentials > > > > > > > that > > > > > > > > > > > Polaris has > > > > > > > > > > > > > > to clients. > > > > > > > > > > > > > > > > > > > > > > > > > > > > While this may (rightly) be considered suboptimal > > > from > > > > > the > > > > > > > > > security > > > > > > > > > > > > > > perspective, this option does give users a choice to > > > > > operate > > > > > > > > > clients > > > > > > > > > > > > > > without explicitly configuring storage credentials > > > for > > > > > them. > > > > > > > > > Polaris > > > > > > > > > > > > > > Servers still control the rotation of those > > > credentials. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) Add secondary plain credentials for vending to > > > > > clients. > > > > > > > > > Polaris > > > > > > > > > > > itself > > > > > > > > > > > > > > will use one key/secret pair. Clients will be issued > > > > > another > > > > > > > > > > > key/secret > > > > > > > > > > > > > > pair. Rotation of the client credentials should be > > > > > possible > > > > > > > to > > > > > > > > > > > implement > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/issues/1530#issuecomment-3137374380 > > > > > > > > > > > > > > [2] https://github.com/apache/polaris/issues/2207 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > This email, including its contents and any attachment(s), > > > may > > > > > > > contain > > > > > > > > > > > confidential and/or proprietary information and is solely > > > for > > > > > the > > > > > > > > > review > > > > > > > > > > > and use of the intended recipient(s). If you have received > > > this > > > > > > > email > > > > > > > > > in > > > > > > > > > > > error, please notify the sender and permanently delete > > > > > > > > > > > this > > > > > email, > > > > > > > its > > > > > > > > > > > content, and any attachment(s). Any disclosure, copying, > > > or > > > > > > > taking of > > > > > > > > > any > > > > > > > > > > > action in reliance on an email received in error is > > > strictly > > > > > > > > > prohibited. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >