davidzollo commented on issue #10309:
URL: https://github.com/apache/seatunnel/issues/10309#issuecomment-3732507174
Hi @krutoileshii,
Thanks for the proposal! This is a great feature and aligns well with our
enterprise security needs.
We already have an SPI interface designed for configuration
encryption/decryption called `ConfigShade`. I believe this is exactly the
extension point you are looking for, and it would be the best place to
implement the Key Vault integration without modifying the core config loading
logic.
If we don't let client do this variable substitution, the change would be
**quite significant** (architectural level).
Current Architecture:
In the CLI submission mode (`seatunnel.sh`), the Client acts as the
**Driver**. It needs to:
1. Parse the config.
2. **Connect to the datasource to infer Schema/Catalog.** (This is the
blocking point).
3. Build the Logical DAG.
4. Send the DAG to the Master.
If we prevent the Client from substituting variables (decrypting secrets),
the Client will fail to connect to the database to fetch metadata (Schema), and
the job submission will fail immediately at the client side.
To achieve "Client-side encryption, Server-side decryption", we would
effectively need to move the entire DAG building process (including Schema
inference) from the Client to the Master node (similar to "Application Mode" in
other compute engines). This is a much larger scope than just adding KeyVault
support.
For now, I recommend sticking to the standard SPI approach where both Client
and Server have access to the decryption provider. If the user submits jobs via
**REST API**, the parsing happens on the Server, so the Client doesn't need
credentials in that specific case.
Here is a brief guide on how to get started:
### 1. Implement the Interface
You need to create a new class that implements
`org.apache.seatunnel.api.configuration.ConfigShade`.
* **`getIdentifier()`**: Return a unique identifier for your provider,
e.g., `"azure-kv"`.
* **`decrypt(String content)`**: This is where the core logic goes. The
`content` parameter will receive the value from your config file (e.g.,
`${keyvault:azure:my-key-vault/db-password}`).
* Your implementation should check if the content matches your
expected pattern.
* If it matches, connect to Azure Key Vault (using the Azure SDK),
fetch the secret, and return the actual plaintext value.
* If it doesn't match, you can return the content as is.
* **`encrypt(String content)`**: (Optional) You can implement this if you
want to support converting a plaintext value into a Key Vault reference (e.g.,
uploading it to the Vault), or just return the original `content` if this flow
isn't needed.
### 2. Configuration & Usage
Users can select your implementation in their job configuration (e.g.,
v2.batch.config.template or their job file) under the `env` block.
```hocon
env {
# This tells SeaTunnel to use your SPI implementation
shade.identifier = "azure-kv"
# You can pass additional properties to your Init method via shade.props
shade.props {
vault.url = "https://my-vault.vault.azure.net/"
# ...
}
}
sink {
Jdbc {
password = "${keyvault:azure:my-key-vault/db-password}"
# ...
}
}
```
### 3. Code References
* **Interface**: `org.apache.seatunnel.api.configuration.ConfigShade` (in
seatunnel-api module).
* **Invocation Logic**: The logic that loads and calls this SPI is in
`org.apache.seatunnel.core.starter.utils.ConfigShadeUtils`.
* **Example**: You can check the inner class `Base64ConfigShade` in
ConfigShadeUtils.java for a simple reference implementation.
By using this SPI, `ConfigShadeUtils` will automatically identify sensitive
fields (like `password`, `username`) and call your `decrypt` method for them
during the job startup, so you don't need to manually intercept configurations.
Let me know if you have any further questions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]