Copilot commented on code in PR #9788:
URL: https://github.com/apache/gravitino/pull/9788#discussion_r2731533392


##########
docs/security/how-to-authenticate.md:
##########
@@ -113,6 +113,78 @@ GravitinoClient client = GravitinoClient.builder(uri)
     .build();
 ```
 
+### Principal mapping
+
+Gravitino supports principal mapping to transform authenticated principals 
(from OAuth or Kerberos) into user identities for authorization. By default, 
Gravitino uses regex-based mapping.
+
+#### OAuth principal mapping
+
+For OAuth authentication, principals are extracted from JWT claims (configured 
via `gravitino.authenticator.oauth.principalFields`). You can customize how 
these principals are mapped:
+
+```text
+# Use default regex mapper that extracts everything (passes through unchanged)
+gravitino.authenticator.oauth.principalMapperType = regex
+gravitino.authenticator.oauth.principalMapperPattern = ^(.*)$
+
+# Extract username from email (e.g., [email protected] -> user)
+gravitino.authenticator.oauth.principalMapperType = regex
+gravitino.authenticator.oauth.principalMapperPattern = ([^@]+)@.*

Review Comment:
   The configuration property name should be 
`gravitino.authenticator.oauth.principalMapper.regex.pattern` (with `.regex.` 
segment) to match the actual property defined in `OAuthConfig.java` line 160. 
The same issue appears in line 131 and Kerberos examples.
   ```suggestion
   gravitino.authenticator.oauth.principalMapper.regex.pattern = ^(.*)$
   
   # Extract username from email (e.g., [email protected] -> user)
   gravitino.authenticator.oauth.principalMapperType = regex
   gravitino.authenticator.oauth.principalMapper.regex.pattern = ([^@]+)@.*
   ```



##########
server-common/src/main/java/org/apache/gravitino/server/authentication/KerberosConfig.java:
##########
@@ -42,4 +42,26 @@ public interface KerberosConfig {
           .stringConf()
           .checkValue(StringUtils::isNotBlank, 
ConfigConstants.NOT_BLANK_ERROR_MSG)
           .create();
+
+  ConfigEntry<String> PRINCIPAL_MAPPER_TYPE =
+      new ConfigBuilder(KERBEROS_CONFIG_PREFIX + "principalMapperType")
+          .doc(
+              "Type of principal mapper to use for Kerberos authentication. "
+                  + "Built-in value: 'regex' (uses regex pattern to extract 
username). "
+                  + "Can also be a fully qualified class name implementing 
PrincipalMapper for custom logic. "
+                  + "Custom mappers can use KerberosPrincipal.parse() for 
structured Kerberos principal parsing.")
+          .version(ConfigConstants.VERSION_1_2_0)
+          .stringConf()
+          .createWithDefault("regex");
+
+  ConfigEntry<String> PRINCIPAL_MAPPER_REGEX_PATTERN =
+      new ConfigBuilder(KERBEROS_CONFIG_PREFIX + 
"principalMapper.regex.pattern")
+          .doc(
+              "Regex pattern to extract the username from the Kerberos 
principal. "
+                  + "Only used when principalMapperType is 'regex'. "
+                  + "The pattern should contain at least one capturing group. "
+                  + "Default pattern '([^@]+).*' extracts everything before 
'@' (including instance if present).")

Review Comment:
   The documentation describes the default pattern as extracting 'everything 
before @', but the actual pattern `([^@]+).*` captures only the first 
occurrence of characters before '@'. Consider clarifying that this includes the 
primary and instance components (e.g., 'HTTP/host') when present, but stops at 
the realm delimiter.
   ```suggestion
                     + "Default pattern '([^@]+).*' captures the primary and 
optional instance components of the Kerberos principal up to, but not 
including, the realm delimiter '@' "
                     + "(for example, 'HTTP/host' from 
'HTTP/[email protected]').")
   ```



##########
server-common/src/main/java/org/apache/gravitino/server/authentication/JwksTokenValidator.java:
##########
@@ -132,7 +139,9 @@ public Principal validateToken(String token, String 
serviceAudience) {
         throw new UnauthorizedException("No valid principal found in token");
       }
 
-      return new UserPrincipal(principal);
+      // Use principal mapper to extract username
+      Principal mappedPrincipal = principalMapper.map(principal);
+      return mappedPrincipal;

Review Comment:
   The intermediate variable `mappedPrincipal` is unnecessary. Consider 
returning the result directly: `return principalMapper.map(principal);`
   ```suggestion
         return principalMapper.map(principal);
   ```



##########
docs/security/how-to-authenticate.md:
##########
@@ -134,8 +206,12 @@ Gravitino server and Gravitino Iceberg REST server share 
the same configuration
 | `gravitino.authenticator.oauth.jwksUri`             | JWKS URI for 
server-side OAuth token validation. Required when using JWKS-based validation.  
                                                                                
                                                                                
           | (none)                                                             
 | Yes if `tokenValidatorClass` is 
`org.apache.gravitino.server.authentication.JwksTokenValidator` | 1.0.0         
   |
 | `gravitino.authenticator.oauth.principalFields`     | JWT claim field(s) to 
use as principal identity. Comma-separated list for fallback in order (e.g., 
'preferred_username,email,sub').                                                
                                                                                
     | `sub`                                                               | No 
                                                                                
             | 1.0.0            |
 | `gravitino.authenticator.oauth.tokenValidatorClass` | Fully qualified class 
name of the OAuth token validator implementation. Use 
`org.apache.gravitino.server.authentication.JwksTokenValidator` for JWKS-based 
validation or 
`org.apache.gravitino.server.authentication.StaticSignKeyValidator` for static 
key validation. | 
`org.apache.gravitino.server.authentication.StaticSignKeyValidator` | No        
                                                                                
      | 1.0.0            |
+| `gravitino.authenticator.oauth.principalMapperType` | Principal mapper type 
for OAuth. Use 'regex' for regex-based mapping, or provide a fully qualified 
class name implementing `org.apache.gravitino.auth.PrincipalMapper`.            
                                                                                
     | `regex`                                                             | No 
                                                                                
             | 1.2.0            |
+| `gravitino.authenticator.oauth.principalMapperPattern` | Regex pattern for 
OAuth principal mapping. First capture group becomes the mapped principal. Only 
used when principalMapperType is 'regex'.                                       
                                                                                
    | `^(.*)$`                                                            | No  
                                                                                
            | 1.2.0            |

Review Comment:
   The configuration property name should be 
`gravitino.authenticator.oauth.principalMapper.regex.pattern` (with `.regex.` 
segment) to match the actual property defined in `OAuthConfig.java` line 160.
   ```suggestion
   | `gravitino.authenticator.oauth.principalMapper.regex.pattern` | Regex 
pattern for OAuth principal mapping. First capture group becomes the mapped 
principal. Only used when principalMapperType is 'regex'.                       
                                                                                
                    | `^(.*)$`                                                  
          | No                                                                  
                            | 1.2.0            |
   ```



##########
server-common/src/main/java/org/apache/gravitino/server/authentication/KerberosAuthenticator.java:
##########
@@ -170,23 +177,14 @@ private Principal retrievePrincipalFromToken(String 
serverPrincipal, byte[] clie
         throw new UnauthorizedException("GssContext isn't established", 
challenge);
       }
 
-      // Usually principal names are in the form 'user/instance@REALM' or 
'user@REALM'.
-      List<String> principalComponents =
-          Splitter.on('@').splitToList(gssContext.getSrcName().toString());
-      if (principalComponents.size() != 2) {
-        throw new UnauthorizedException("Principal has wrong format", 
AuthConstants.NEGOTIATE);
-      }
+      // Extract principal from GSS context and map it using configured mapper
+      String principalString = gssContext.getSrcName().toString();
 
-      String user = principalComponents.get(0);
-      // TODO: We will have KerberosUserPrincipal in the future.
-      //  We can put more information of Kerberos to the KerberosUserPrincipal
-      // For example, we can put the token into the KerberosUserPrincipal,
-      // We can return the token to the client in the AuthenticationFilter. It 
will be convenient
-      // for client to establish the security context. Hadoop uses the cookie 
to store the token.
-      // For now, we don't store it in the cookie. I can have a simple 
implementation. first.
-      // It's also not required for the protocol.
-      // https://datatracker.ietf.org/doc/html/rfc4559
-      return new UserPrincipal(user);
+      // Use principal mapper to extract the principal
+      // This allows for flexible mapping strategies (regex, kerberos-specific 
parsing, etc.)
+      // The mapper will handle validation and extraction based on its 
configured type
+      Principal mappedPrincipal = principalMapper.map(principalString);
+      return mappedPrincipal;

Review Comment:
   The intermediate variable `mappedPrincipal` is unnecessary. Consider 
returning the result directly: `return principalMapper.map(principalString);`
   ```suggestion
         return principalMapper.map(principalString);
   ```



##########
docs/security/how-to-authenticate.md:
##########
@@ -134,8 +206,12 @@ Gravitino server and Gravitino Iceberg REST server share 
the same configuration
 | `gravitino.authenticator.oauth.jwksUri`             | JWKS URI for 
server-side OAuth token validation. Required when using JWKS-based validation.  
                                                                                
                                                                                
           | (none)                                                             
 | Yes if `tokenValidatorClass` is 
`org.apache.gravitino.server.authentication.JwksTokenValidator` | 1.0.0         
   |
 | `gravitino.authenticator.oauth.principalFields`     | JWT claim field(s) to 
use as principal identity. Comma-separated list for fallback in order (e.g., 
'preferred_username,email,sub').                                                
                                                                                
     | `sub`                                                               | No 
                                                                                
             | 1.0.0            |
 | `gravitino.authenticator.oauth.tokenValidatorClass` | Fully qualified class 
name of the OAuth token validator implementation. Use 
`org.apache.gravitino.server.authentication.JwksTokenValidator` for JWKS-based 
validation or 
`org.apache.gravitino.server.authentication.StaticSignKeyValidator` for static 
key validation. | 
`org.apache.gravitino.server.authentication.StaticSignKeyValidator` | No        
                                                                                
      | 1.0.0            |
+| `gravitino.authenticator.oauth.principalMapperType` | Principal mapper type 
for OAuth. Use 'regex' for regex-based mapping, or provide a fully qualified 
class name implementing `org.apache.gravitino.auth.PrincipalMapper`.            
                                                                                
     | `regex`                                                             | No 
                                                                                
             | 1.2.0            |
+| `gravitino.authenticator.oauth.principalMapperPattern` | Regex pattern for 
OAuth principal mapping. First capture group becomes the mapped principal. Only 
used when principalMapperType is 'regex'.                                       
                                                                                
    | `^(.*)$`                                                            | No  
                                                                                
            | 1.2.0            |
 | `gravitino.authenticator.kerberos.principal`        | Indicates the Kerberos 
principal to be used for HTTP endpoint. Principal should start with `HTTP/`.    
                                                                                
                                                                                
 | (none)                                                              | Yes if 
use `kerberos` as the authenticator                                             
         | 0.4.0            |
 | `gravitino.authenticator.kerberos.keytab`           | Location of the keytab 
file with the credentials for the principal.                                    
                                                                                
                                                                                
 | (none)                                                              | Yes if 
use `kerberos` as the authenticator                                             
         | 0.4.0            |
+| `gravitino.authenticator.kerberos.principalMapperType` | Principal mapper 
type for Kerberos. Use 'regex' for regex-based mapping, or provide a fully 
qualified class name implementing `org.apache.gravitino.auth.PrincipalMapper`.  
                                                                                
          | `regex`                                                             
| No                                                                            
                  | 1.2.0            |
+| `gravitino.authenticator.kerberos.principalMapperPattern` | Regex pattern 
for Kerberos principal mapping. First capture group becomes the mapped 
principal. Only used when principalMapperType is 'regex'.                       
                                                                                
                | `([^@]+).*`                                                   
      | No                                                                      
                        | 1.2.0            |

Review Comment:
   The configuration property name should be 
`gravitino.authenticator.kerberos.principalMapper.regex.pattern` (with 
`.regex.` segment) to match the actual property defined in 
`KerberosConfig.java` line 58.
   ```suggestion
   | `gravitino.authenticator.kerberos.principalMapper.regex.pattern` | Regex 
pattern for Kerberos principal mapping. First capture group becomes the mapped 
principal. Only used when principalMapperType is 'regex'.                       
                                                                                
                | `([^@]+).*`                                                   
      | No                                                                      
                        | 1.2.0            |
   ```



##########
server-common/src/main/java/org/apache/gravitino/server/authentication/StaticSignKeyValidator.java:
##########
@@ -97,7 +104,10 @@ public Principal validateToken(String token, String 
serviceAudience) {
         throw new UnauthorizedException(
             "Audiences in token is not in expected format: %s", 
audienceObject);
       }
-      return new UserPrincipal(jwt.getBody().getSubject());
+
+      // Use principal mapper to extract username
+      Principal mappedPrincipal = 
principalMapper.map(jwt.getBody().getSubject());
+      return mappedPrincipal;
     } catch (ExpiredJwtException
         | UnsupportedJwtException

Review Comment:
   The intermediate variable `mappedPrincipal` is unnecessary. Consider 
returning the result directly: `return 
principalMapper.map(jwt.getBody().getSubject());`
   ```suggestion
         return principalMapper.map(jwt.getBody().getSubject());
       } catch (ExpiredJwtException
           | UnsupportedJwtException
           | UnsupportedJwtException
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to