davidzollo opened a new issue, #10341:
URL: https://github.com/apache/seatunnel/issues/10341

   
   ## Description
   
   SeaTunnel currently provides File sink connectors for multiple object stores 
(e.g., `S3File`, `OssFile`, `ObsFile`, `CosFile`, `HdfsFile`), but there is no 
dedicated sink connector for **Azure Blob Storage**.
   Many users run data integration workloads on Azure and need to land files 
(CSV/Parquet/ORC/JSON/Binary, etc.) into Azure Blob Storage (and optionally 
ADLS Gen2) with the same usability and guarantees as existing File sinks.
   
   I’d like to request a new sink connector to write SeaTunnel output to Azure 
Blob Storage.
   
   ## Usage Scenario
   
   - Persist batch/stream outputs to Azure Blob Storage for downstream 
analytics (Synapse, Databricks, Spark, etc.).
   - Store partitioned datasets (e.g., `dt=YYYY-MM-DD/`) in Parquet/ORC/CSV.
   - Need exactly-once semantics similar to existing File sinks (2PC + temp 
directory + commit/rename).
   
   ## Proposed Scope
   
   - Add a new SeaTunnel v2 sink connector: `AzureBlobFile` (under the 
`connector-file` family).
   - Support Azure endpoints/schemes:
     - Azure Blob (WASB): `wasb://` / `wasbs://`
     - (Optional) ADLS Gen2 (ABFS): `abfs://` / `abfss://`
   - Support common File sink capabilities consistent with existing connectors:
     - File formats: `text`, `csv`, `parquet`, `orc`, `json`, `excel`, `xml`, 
`binary` (following what other File sinks support)
     - Partitioned writes
     - Exactly-once via existing 2PC/commit behavior
   - Authentication (at least):
     - Account key
     - SAS token
     - (Nice-to-have) AAD OAuth / Managed Identity (depending on feasibility 
and community preference)
   
   ## Configuration Proposal
   
   Keep options consistent with other File sinks, plus Azure-specific configs. 
For advanced scenarios, allow passing Hadoop Azure FS properties as a map 
(similar to `S3File`’s `hadoop_s3_properties`), e.g. `hadoop_azure_properties`.
   
   Example (illustrative):
   
   ```hocon
   sink {
     AzureBlobFile {
       path = "wasbs://<container>@<account>.blob.core.windows.net/<dir>"
       tmp_path = 
"wasbs://<container>@<account>.blob.core.windows.net/<tmp_dir>"
   
       # common file sink options
       file_format_type = "parquet"
       have_partition = true
       partition_by = ["dt"]
   
       # Azure auth (one of)
       account_name = "<account>"
       account_key = "<account_key>"
       # or sas_token = "<sas_token>"
   
       # pass-through for Hadoop Azure FS properties (optional)
       hadoop_azure_properties = {
         # examples:
         # "fs.azure.account.key.<account>.blob.core.windows.net" = 
"<account_key>"
         # "fs.azure.sas.<container>.<account>.blob.core.windows.net" = "<sas>"
       }
     }
   }
   ```
   
   ## Dependencies / Packaging
   
   - Similar to `S3File`, clarify required Hadoop/Azure jars for Spark/Flink 
clusters (e.g., `hadoop-azure` + Azure storage dependencies), and ensure 
SeaTunnel Engine distribution includes what’s needed (or documents what to add 
under `${SEATUNNEL_HOME}/lib`).
   
   ## Acceptance Criteria
   
   - `AzureBlobFile` sink connector is available and documented (new doc page 
under `docs/en/connector-v2/sink/`).
   - Can write files to Azure Blob Storage using account key (baseline) with 
the same semantics as other File sinks.
   - Includes at least one integration test (recommended: use Azurite) 
validating end-to-end file writes.
   - Works on SeaTunnel Zeta, and documents Spark/Flink requirements.
   
   ## Related issues
   
   - N/A (not found yet).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to