Hi Ryan, Great idea! I will add this topic to the agenda today.
I also prepared a proposal document to facilitate the discussion: https://docs.google.com/document/d/1ZcZ5VrXZZOgYllPI9-HTZt8986kBJTMQwFHT_-ASgj0/edit?usp=sharing Thanks, Alex On Wed, Jul 30, 2025 at 1:23 AM Ryan Blue <rdb...@gmail.com> wrote: > > Hi Alex, I think it's a great idea to break down contributions like this into > smaller PRs. It's probably good to discuss this at tomorrow's catalog sync to > prioritize the functionality you want to add and figure out the best way to > fit it in. > > On Tue, Jul 29, 2025 at 11:33 AM Alex Dutra <alex.du...@dremio.com.invalid> > wrote: >> >> Dear Community, >> >> I would like to revive this discussion regarding the potential donation of >> Dremio's Auth Manager. >> >> Over the past few days, I have explored the suggestion of dividing the >> contribution into smaller parts. I am pleased to report that I have >> successfully broken down the features into approximately 15 pull requests, >> targeting the main Iceberg repository. >> >> While these pull requests are all rather substantial, I think that they >> remain within a manageable size for reviewers. >> >> Would this approach be a good path forward? If so, I can share more details >> about the timeline and roadmap I have in mind, and of course, I am prepared >> to begin the donation as soon as I have the Community's green light. >> >> Thanks, >> Alex Dutra >> >> >> On Wed, Jun 25, 2025 at 9:57 AM Alex Dutra <alex.du...@dremio.com> wrote: >>> >>> Hi Daniel, hi all, >>> >>> Sorry for the late reply. Here are some answers to your questions: >>> >>> > I was under the impression that the AuthManager implementation was >>> > relatively small (based on the recent work for the GCP AuthManager) >>> >>> These are not comparable. The GCP AuthManager is small because it only >>> works for GCP, and thus can leverage Google auth libraries (more >>> specifically, it uses the google-auth-library-oauth2-http artifact; >>> and since this artifact is already a required dependency for >>> iceberg-gcp, it doesn't bring in any extra dependency). >>> >>> Conversely, this AuthManager is a general-purpose AuthManager that can >>> work with any IDP. >>> >>> > The broader community wasn't involved in decisions made about the >>> > implementation >>> >>> That’s exactly the purpose of this donation. >>> >>> > "impersonation flow" which I'm not familiar with >>> >>> This is a feature where the manager can dynamically fetch the subject >>> token for a token exchange, thus managing both the catalog's token and >>> the user's token, facilitating impersonation (and delegation) use >>> cases. Hence the name (admittedly a bit confusing). This feature is >>> still evolving, but we received positive feedback from users and we >>> believe it brings a lot of value – and is not something that a >>> third-party library could do. >>> >>> > we need to break it into smaller contributions and figure out the >>> > appropriate way to review and assimilate the functionality >>> >>> While we are open to this option, we are concerned about the potential >>> duration of its completion. In the interim, users have expressed a >>> need for improved OAuth2 support. Would it be possible to gain some >>> clarity regarding the timeline for a review of this initiative? >>> Perhaps an initial review of the current codebase could help identify >>> and address any potential roadblocks? I can also schedule a demo of >>> the new auth manager, if that helps. >>> >>> > how well the community understands the behaviors. >>> >>> While OAuth2 may not be familiar or palatable to most Iceberg >>> contributors, I am confident that some of them possess the expertise >>> to effectively review and assess the donation. >>> >>> > The main competency of this project isn't to implement security protocols >>> >>> This may be true for the GCP auth manager or for the SigV4 one – these >>> are vendor-specific and can leverage the respective vendor's SDK. But >>> how would we support OAuth2 in a generic way otherwise? Or Kerberos? >>> Whether this is a competency of the project or not is debatable. >>> Managing HTTP requests is not a main competency of this project >>> either, and yet we have one RESTClient interface and one HTTPClient >>> implementation, and lots of JSON parsers. >>> >>> The RESTClient in its current form already implies using some >>> authentication protocol. The simple case of using static (provided via >>> configuration) tokens does not cover real-world cases that users have >>> expressed interest in. Accepting the Auth Manager will certainly >>> require some extra attention to security protocols from Iceberg >>> maintainers, but it will allow the project to support more advanced >>> use cases. Additionally, the Auth Manager provides a path for users of >>> the existing, deprecated “/token” endpoint to migrate to standard >>> RFC-based OAuth flows. >>> >>> > Was there any exploration of leveraging other standard implementations >>> > like Apache Oltu, Nimbus, etc. to build the implementation off of? >>> >>> Yes, we considered that and decided not to go down that route. For a >>> few reasons: >>> >>> 1. Most OAuth libraries provide building blocks to create clients, but >>> they are not fully-fledged clients; you still need to write code in >>> order to glue things together [1]. >>> >>> 2. These libraries usually have (too?) many dependencies [2]; some of >>> them have not been maintained for a while. And Apache Oltu is retired. >>> In contrast, our Auth Manager only has one small dependency: >>> auth0-jwt. >>> >>> 3. If you delegate to a third-party library, then you cannot share the >>> catalog's RESTClient or Executor. The library is going to maintain its >>> own HTTP client and executor, leading to increased resource >>> consumption. >>> >>> 4. Nothing precludes us from switching to a third-party library later >>> on (it's an implementation detail). We thought it's best to start with >>> a self-contained project. >>> >>> Thanks, >>> Alex >>> >>> [1]: >>> https://connect2id.com/products/nimbus-oauth-openid-connect-sdk/guides/oauth-client-server-development >>> [2] For Nimbus: >>> https://central.sonatype.com/artifact/com.nimbusds/oauth2-oidc-sdk/11.26/dependencies >>> >>> On Thu, Jun 19, 2025 at 5:58 PM Daniel Weeks <dwe...@apache.org> wrote: >>> > >>> > I hadn't seen this thread before we discussed it yesterday, but since >>> > then I've taken a look and have some reservations. >>> > >>> > I was under the impression that the AuthManager implementation was >>> > relatively small (based on the recent work for the GCP AuthManager), but >>> > after taking a look at the repo, this is far from a small contribution. >>> > >>> > I strongly support more robust security support (especially for >>> > OAuth2/OIDC), but I don't feel this is going to be a small effort to >>> > introduce. The broader community wasn't involved in decisions made about >>> > the implementation and I see elements that give me pause (like >>> > "impersonation flow" which I'm not familiar with and implementation >>> > details like extensions to immutables that aren't consistent with the >>> > broader codebase). >>> > >>> > If we decide that we want to take this on, I feel like we need to break >>> > it into smaller contributions and figure out the appropriate way to >>> > review and assimilate the functionality in a way that's consistent with >>> > the rest of the project. Due to this being security related, we should >>> > take extra precautions around what this introduces and how well the >>> > community understands the behaviors. >>> > >>> > However, looking at the complexity here relative to the approach with the >>> > GCP, I have to question whether this is the right path overall. The main >>> > competency of this project isn't to implement security protocols, so it's >>> > a lot to say we want a full and complete (possibly with extensions) >>> > native implementation of the OAuth2 specification (there are whole >>> > projects built around that alone). >>> > >>> > Was there any exploration of leveraging other standard implementations >>> > like Apache Oltu, Nimbus, etc. to build the implementation off of? >>> > >>> > -Dan >>> > >>> > On Thu, Jun 19, 2025 at 5:33 AM Alex Dutra >>> > <alex.du...@dremio.com.invalid> wrote: >>> >> >>> >> Hi Ryan & JB, hi all, >>> >> >>> >> I think it would be easier to introduce this new manager as an >>> >> alternative manager. This would make the migration smoother as it >>> >> would give users time to migrate at their convenience. Besides, the >>> >> new manager has the notion of "dialects", and can be configured to >>> >> behave exactly like the current one (honoring the same config >>> >> options), making the migration even easier. >>> >> >>> >> > Why not contribute the functionality directly to the AuthManager >>> >> > already in Iceberg? Is this incompatible or is there a reason the >>> >> > current one can't be extended through contributions? >>> >> >>> >> There are a few reasons why I believe it's not possible to extend the >>> >> current manager indefinitely: >>> >> >>> >> 1. The current auth manager lives in iceberg-core; as we introduce >>> >> more features, it will become impractical to keep it there, especially >>> >> since some of the features will require third-party dependencies. As a >>> >> data point: the new manager contains almost 100 Java production >>> >> classes (not counting test classes and build scripts). >>> >> 2. The current auth manager has some well known shortcomings, notably >>> >> around token refreshes. It's not possible to fix that without >>> >> introducing regressions and potentially breaking many catalog clients >>> >> already in production. >>> >> 3. As we introduce features like Authorization Code grant support, >>> >> interactions with the IDP will become more complex than just a >>> >> request-response cycle. Since most of the current logic resides in the >>> >> OAuth2Util class, which is entirely public, it won't be an easy task >>> >> to introduce support for such complex flows while avoiding binary >>> >> incompatibilities. >>> >> >>> >> Thanks, >>> >> Alex >>> >> >>> >> >>> >> On Wed, Jun 18, 2025 at 11:35 PM Jean-Baptiste Onofré >>> >> <j...@nanthrax.net> wrote: >>> >> > >>> >> > Hi >>> >> > >>> >> > I think it makes sense to directly add in AuthManager. I don't see >>> >> > blockers (with some adaptations). Alex ? >>> >> > >>> >> > From a donation process standpoint (if accepted), I'm happy to help >>> >> > with the SGA and IP Clearance. >>> >> > >>> >> > Regards >>> >> > JB >>> >> > >>> >> > On Wed, Jun 18, 2025 at 9:15 PM Ryan Blue <rdb...@gmail.com> wrote: >>> >> > > >>> >> > > I think it would be great to bring this functionality into Iceberg. >>> >> > > I'm curious about your plan for getting it in. It sounds like you're >>> >> > > suggesting adding the Dremio project to the Iceberg repo and making >>> >> > > it optional. Why not contribute the functionality directly to the >>> >> > > AuthManager already in Iceberg? Is this incompatible or is there a >>> >> > > reason the current one can't be extended through contributions? >>> >> > > >>> >> > > On Tue, Jun 17, 2025 at 11:23 AM Christian Thiel >>> >> > > <christian.t.b...@gmail.com> wrote: >>> >> > >> >>> >> > >> Hey Alex, >>> >> > >> >>> >> > >> Thanks for the Initiative — I really appreciate the effort here! >>> >> > >> >>> >> > >> Having good auth compatibility in the Catalog ecosystem is key to >>> >> > >> establish secure standards by making them easy to use. While >>> >> > >> Iceberg should stay open to other means of Authentication, OAuth2 >>> >> > >> is the most widely adopted interoperable auth standard, and its >>> >> > >> role in Iceberg REST reflects that. But with human-centric flows >>> >> > >> like Auth Code (with PKCE 😉) and Device Code missing from most >>> >> > >> standard clients, users often default to handing out personal >>> >> > >> Client ID/secret pairs—which is really bad from a security >>> >> > >> perspective. >>> >> > >> >>> >> > >> While I can’t speak to the Java details, I fully support bringing >>> >> > >> the functionality into Iceberg. I have tested the proposed code >>> >> > >> successfully with Spark and different IdPs, including Auth & Device >>> >> > >> Code flows with token refresh, as well as token refresh for Client >>> >> > >> Credential flows. >>> >> > >> >>> >> > >> Thanks! >>> >> > >> >>> >> > >> Christian >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> On Mon, 16 Jun 2025 at 20:33, Alex Dutra >>> >> > >> <alex.du...@dremio.com.invalid> wrote: >>> >> > >>> >>> >> > >>> Hi all, >>> >> > >>> >>> >> > >>> Dremio recently open-sourced a new implementation of the Auth >>> >> > >>> Manager >>> >> > >>> API for OAuth2: >>> >> > >>> >>> >> > >>> https://github.com/dremio/iceberg-auth-manager >>> >> > >>> >>> >> > >>> I wrote a blog post about it a while ago [1]. >>> >> > >>> >>> >> > >>> Built on top of the Auth Manager API introduced in Iceberg 1.9.0, >>> >> > >>> this >>> >> > >>> project provides a more flexible and extensible OAuth2 manager >>> >> > >>> compared to the built-in equivalent in Iceberg Core. It follows >>> >> > >>> OAuth2 >>> >> > >>> standards strictly, but also provides compatibility with any >>> >> > >>> existing >>> >> > >>> Apache Iceberg REST catalog, and contains no Dremio-specific >>> >> > >>> functionality. To date, this is the only OAuth2 manager fully >>> >> > >>> compliant with external identity providers. >>> >> > >>> >>> >> > >>> Dremio would like to contribute this code to the Apache Iceberg >>> >> > >>> project. I am therefore initiating this discussion to determine the >>> >> > >>> community's interest in accepting this donation. >>> >> > >>> >>> >> > >>> This project is beneficial to the community because it addresses >>> >> > >>> well-known limitations, such as token refresh problems [2][3][4], >>> >> > >>> and >>> >> > >>> also because it introduces highly anticipated features like the >>> >> > >>> Authorization Code grant support [5]. Fixing these limitations or >>> >> > >>> adding support for such large features in the built-in manager, >>> >> > >>> while >>> >> > >>> avoiding any risk of regressions, would have been a lot harder. >>> >> > >>> >>> >> > >>> Also worth mentioning: this project adheres to the "Iceberg OAuth2 >>> >> > >>> Client Authentication Guide", proposed by Christian Thiel [6]. >>> >> > >>> >>> >> > >>> This project could initially serve as a runtime-selectable >>> >> > >>> alternative >>> >> > >>> to the current built-in implementation. Upon reaching sufficient >>> >> > >>> maturity however, it could potentially replace the existing >>> >> > >>> manager. >>> >> > >>> >>> >> > >>> Please share your thoughts by replying to this email. >>> >> > >>> Alternatively, >>> >> > >>> we can discuss this topic at the Catalog Sync meeting this >>> >> > >>> Wednesday, >>> >> > >>> June 18th, if that is a more comfortable option to everyone. >>> >> > >>> >>> >> > >>> Thanks, >>> >> > >>> >>> >> > >>> Alex >>> >> > >>> >>> >> > >>> [1] >>> >> > >>> https://medium.com/data-engineering-with-dremio/introducing-dremio-auth-manager-for-apache-iceberg-223827342d19 >>> >> > >>> [2]: https://github.com/apache/iceberg/issues/12196 >>> >> > >>> [3]: https://github.com/apache/iceberg/issues/12363 >>> >> > >>> [4]: https://github.com/apache/iceberg/issues/13030 >>> >> > >>> [5]: https://github.com/apache/iceberg/issues/10677 >>> >> > >>> [6]: >>> >> > >>> https://docs.google.com/document/d/1buW9PCNoHPeP7Br5_vZRTU-_3TExwLx6bs075gi94xc/edit?tab=t.0#heading=h.hufqidg1ij89