davidzollo commented on issue #10309:
URL: https://github.com/apache/seatunnel/issues/10309#issuecomment-3732507174

   Hi @krutoileshii,
   
   Thanks for the proposal! This is a great feature and aligns well with our 
enterprise security needs.
   
   We already have an SPI interface designed for configuration 
encryption/decryption called `ConfigShade`. I believe this is exactly the 
extension point you are looking for, and it would be the best place to 
implement the Key Vault integration without modifying the core config loading 
logic.
   
   
   If we don't let client do this variable substitution, the change would be 
**quite significant** (architectural level).
   
   Current Architecture:
   In the CLI submission mode (`seatunnel.sh`), the Client acts as the 
**Driver**. It needs to:
   1. Parse the config.
   2. **Connect to the datasource to infer Schema/Catalog.** (This is the 
blocking point).
   3. Build the Logical DAG.
   4. Send the DAG to the Master.
   
   If we prevent the Client from substituting variables (decrypting secrets), 
the Client will fail to connect to the database to fetch metadata (Schema), and 
the job submission will fail immediately at the client side.
   
   To achieve "Client-side encryption, Server-side decryption", we would 
effectively need to move the entire DAG building process (including Schema 
inference) from the Client to the Master node (similar to "Application Mode" in 
other compute engines). This is a much larger scope than just adding KeyVault 
support.
   
   For now, I recommend sticking to the standard SPI approach where both Client 
and Server have access to the decryption provider. If the user submits jobs via 
**REST API**, the parsing happens on the Server, so the Client doesn't need 
credentials in that specific case.
   
   
   Here is a brief guide on how to get started:
   
   ### 1. Implement the Interface
   You need to create a new class that implements 
`org.apache.seatunnel.api.configuration.ConfigShade`.
   
   *   **`getIdentifier()`**: Return a unique identifier for your provider, 
e.g., `"azure-kv"`.
   *   **`decrypt(String content)`**: This is where the core logic goes. The 
`content` parameter will receive the value from your config file (e.g., 
`${keyvault:azure:my-key-vault/db-password}`).
       *   Your implementation should check if the content matches your 
expected pattern.
       *   If it matches, connect to Azure Key Vault (using the Azure SDK), 
fetch the secret, and return the actual plaintext value.
       *   If it doesn't match, you can return the content as is.
   *   **`encrypt(String content)`**: (Optional) You can implement this if you 
want to support converting a plaintext value into a Key Vault reference (e.g., 
uploading it to the Vault), or just return the original `content` if this flow 
isn't needed.
   
   ### 2. Configuration & Usage
   Users can select your implementation in their job configuration (e.g., 
v2.batch.config.template or their job file) under the `env` block.
   
   ```hocon
   env {
     # This tells SeaTunnel to use your SPI implementation
     shade.identifier = "azure-kv" 
     
     # You can pass additional properties to your Init method via shade.props
     shade.props {
         vault.url = "https://my-vault.vault.azure.net/";
         # ...
     }
   }
   
   sink {
     Jdbc {
       password = "${keyvault:azure:my-key-vault/db-password}" 
       # ...
     }
   }
   ```
   
   ### 3. Code References
   *   **Interface**: `org.apache.seatunnel.api.configuration.ConfigShade` (in 
seatunnel-api module).
   *   **Invocation Logic**: The logic that loads and calls this SPI is in 
`org.apache.seatunnel.core.starter.utils.ConfigShadeUtils`.
   *   **Example**: You can check the inner class `Base64ConfigShade` in 
ConfigShadeUtils.java for a simple reference implementation.
   
   By using this SPI, `ConfigShadeUtils` will automatically identify sensitive 
fields (like `password`, `username`) and call your `decrypt` method for them 
during the job startup, so you don't need to manually intercept configurations.
   
   
   Let me know if you have any further questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to