betodealmeida opened a new issue, #20300:
URL: https://github.com/apache/superset/issues/20300

   ## [SIP-83] OAuth2 for databases
   
   ### Motivation
   
   Modern cloud databases (eg, Sowflake, BigQuery, and Google Sheets) support 
OAuth2 for user authentication and authorization. Connecting Superset to these 
databases today, specially BigQuery and Google Sheets, is a complicated 
multi-step process, requiring uses to create a service account that can 
potentially give access to all the sheets inside an organization (if 
"impersonate user" is not enabled).
   
   Superset could offer a much better experience to users by supporting OAuth2 
directly. Users would simply have to grant the necessary permissions to 
Superset, and Superset would store and use the access tokens when running 
queries. This would remove the complexity of setting up these databases, and 
solve the problem of risking giving access to confidential sheets.
   
   ### Proposed Change
   
   We propose supporting OAuth2 directly in Superset. When accessing a resource 
(table or Google sheet) that requires OAuth2 the user would be redirected to 
the authorization server, where they can grant the necessary permissions. After 
OAuth2 the access and refresh tokens would be stored in table with grain of 
`(database, user)`, and reused in subsequent connections.
   
   The following video shows the process in action:
   
   
https://user-images.githubusercontent.com/1534870/172488708-06f99cf3-499f-4323-91cd-fa426259e06c.mov
   
   ### New or Changed Public Interfaces
   
   This proposal adds a new table and and a new endpoint. It also modifies 
`modify_url_for_impersonation` to pass a new parameter, the OAuth2 
`access_token`. The SIP will be first implemented for Google Sheets, but the 
implementation is database agnostic, with all database specific logic living in 
the engine spec.
   
   To store the user tokens a new table `database_user_oauth2_tokens` would be 
created, with a corresponding SQLAlchemy model:
   
   | user_id | database_id | access_token | access_token_expiration | 
refresh_token |
   | --- | --- | --- | --- | --- |
   | 1 | 1 | XXX | 2022-06-06 09:00:00Z | YYY |
   
   (The access token and refresh token are stored in encrypted columns, similar 
to other database credentials.)
   
   Here the user with ID 1 has an access token `XXX` on the database with ID 1. 
When they run a query in that database Superset will pass the access token to 
the database engine spec via `modify_url_for_impersonation`. For Google Sheets 
to use this token it's as simple as:
   
   ```python
   class GSheetsEngineSpec(SqliteEngineSpec):
       ...
       @classmethod
       def modify_url_for_impersonation(
           cls,
           url: URL,
           impersonate_user: bool,
           username: Optional[str],
           access_token: Optional[str] = None,  # added in this SIP
       ) -> None:
           if impersonate_user and access_token:
               url.query["access_token"] = access_token
   ```
   
   Different DB engine specs might have to handle the token in a different way.
   
   When passing the token to the DB engine spec Superset will check if it 
hasn't expired. If it has expired and a refresh token is available Superset 
will refresh the token and store the new one by calling a DB engine spec 
method. For Google Sheets:
   
   ```python
   class GSheetsEngineSpec(SqliteEngineSpec):
       ...
       @staticmethod
       def get_oauth2_fresh_token(refresh_token: str) -> OAuth2TokenResponse:
           response = http.request(  # type: ignore
               "POST",
               "https://oauth2.googleapis.com/token";,
               fields={
                   "client_id": current_app.config["GSHEETS_OAUTH2_CLIENT_ID"],
                   "client_secret": 
current_app.config["GSHEETS_OAUTH2_CLIENT_SECRET"],
                   "refresh_token": refresh_token,
                   "grant_type": "refresh_token",
               },
           )
           return json.loads(response.data.decode("utf-8"))
   ```
   
   The refresh token logic is part of the OAuth2 spec, but the details of how 
to do it are left to each DB engine spec. For Google Sheets we need to post a 
payload to https://oauth2.googleapis.com/token, as the example above shows.
   
   How are these tokens obtained and stored? Each DB engine spec should declare 
if it supports OAuth2 via a `is_oauth2_enabled` method, for example:
   
   ```python
   from shillelagh.exception import UnauthenticatedError
   
   class GSheetsEngineSpec(SqliteEngineSpec):
       ...
       oauth2_exception = UnauthenticatedError
   
       @staticmethod
       def is_oauth2_enabled() -> bool:
           return (
               "GSHEETS_OAUTH2_CLIENT_ID" in current_app.config
               and "GSHEETS_OAUTH2_CLIENT_SECRET" in current_app.config
           )
   ```
   
   (The default method in the base class returns `False`.)
   
   Each DB engine spec also defines an exception `oauth2_exception` that is 
raised when OAuth2 is needed (`UnauthenticatedError` in the example above). 
Superset will capture this exception and start the OAuth2 flow by raising a 
custom error that the frontend understands:
   
   ```python
   class BaseEngineSpec:
       ...
       def execute(
           cls,
           cursor: Any,
           query: str,
           database_id: int,
           **kwargs: Any,
       ) -> None:
           try:
               cursor.execute(query)
           except cls.oauth2_exception as ex:
               if cls.is_oauth2_enabled():
                   oauth_url = cls.get_oauth2_authorization_uri(database_id)
                   raise OAuth2RedirectError(oauth_url) from ex
   ```
   
   Here the base engine spec is calling the method 
`get_oauth2_authorization_uri` that is database specific — this returns the URL 
of the authorization server.
   
   When authenticating with OAuth2 it's common to pass a `state` parameter to 
the authorization server to, as the name suggests, maintain state. That 
parameter is returned unmodified from the authorization server, and we can use 
it to verify that the request is valid and came from Superset. In this 
implementation, our state is a JWT signed with config["SECRET_KEY"], with the 
following payload:
   
   ```json
   {"user_id" 1, "database_id" 1}
   ```
   
   The user should be redirected back from the authorization service to a new 
endpoint, `api/v1/databases/oauth2/`. That endpoint will receive the state JWT 
and a `code` query argument. The backend validates the JWT and extracts the 
user and database IDs. It then exchanges the code for the access and refresh 
tokens. Since this is database specific we call a method on the DB engine spec; 
for Google Sheets:
   
   ```python
   class GSheetsEngineSpec(SqliteEngineSpec):
       ...
       @staticmethod
       def get_oauth2_token(code: str) -> OAuth2TokenResponse:
           redirect_uri = current_app.config.get(
               "GSHEETS_OAUTH2_REDIRECT_URI",
               url_for("DatabaseRestApi.oauth2", _external=True),
           )
   
           response = http.request(  # type: ignore
               "POST",
               "https://oauth2.googleapis.com/token";,
               fields={
                   "code": code,
                   "client_id": current_app.config["GSHEETS_OAUTH2_CLIENT_ID"],
                   "client_secret": 
current_app.config["GSHEETS_OAUTH2_CLIENT_SECRET"],
                   "redirect_uri": redirect_uri,
                   "grant_type": "authorization_code",
               },
           )
           return json.loads(response.data.decode("utf-8"))
   ```
   
   Note here that we have an optional `GSHEETS_OAUTH2_REDIRECT_URI` 
configuration key. This should not be used in most situations (but is needed at 
Preset, where we manage multiple Superset instances).
   
   After exchanging the code for the access and refresh tokens they get saved 
to the table `database_user_oauth2_tokens`.
   
   ### New dependencies
   
   No new dependencies, though the [reference 
implementation](https://github.com/apache/superset/pull/20280/files) uses a few 
dependencies-of-dependencies not listed explicitly in Superset's `setup.py` 
(`urllib3` and `yarl`).
   
   ### Migration Plan and Compatibility
   
   A new migration is needed to create the `database_user_oauth2_tokens` table.
   
   ### Rejected Alternatives
   
   There are not many alternative ways to solve the problem of authenticating 
to a database using OAuth2. The table is needed, since we store personal tokens 
at the `(database, user)` grain. We need a new endpoint, to receive the `code` 
from the authorization server. Passing the `access_token` to 
`modify_url_for_impersonation` seems like the cleanest way of supporting OAuth2 
without having to refactor all the engine specs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to