Xiao Huang created FLINK-39245:
----------------------------------

             Summary: Support AWS Glue Catalog for Iceberg pipeline connector
                 Key: FLINK-39245
                 URL: https://issues.apache.org/jira/browse/FLINK-39245
             Project: Flink
          Issue Type: New Feature
          Components: Flink CDC
         Environment: The Iceberg AWS bundle (iceberg-aws-bundle or equivalent) 
and AWS SDK must be available in the runtime classpath. On Amazon EMR, the 
bundled iceberg-flink-runtime already includes Glue Catalog support. For 
non-EMR environments, users need to add iceberg-aws-bundle-<version>.jar to the 
Flink lib/ directory.
            Reporter: Xiao Huang


## Motivation
 
Currently, the Iceberg pipeline connector only supports `hadoop` and `hive` 
catalog types. AWS Glue Data Catalog is widely used as the metastore for 
Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink 
deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg 
tables managed by Glue Catalog are unable to do so with the current 
implementation.
 
Since Iceberg's `CatalogUtil.buildIcebergCatalog()` already natively supports 
`type=glue` (mapping to `org.apache.iceberg.aws.glue.GlueCatalog`), the Flink 
CDC Iceberg connector just needs to:
1. Add `iceberg-aws` as a compile-time dependency
2. Expose the Glue-related configuration options through the pipeline config 
layer
3. Ensure the catalog properties are correctly passed through
 
## Proposed Changes
 
-Add `iceberg-aws` dependency (provided scope) to 
`flink-cdc-pipeline-connector-iceberg`
- Add new configuration options in `IcebergDataSinkOptions`:
-`catalog.properties.type` — extend description to include `glue`
-`catalog.properties.catalog-impl` — custom catalog implementation class
-`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
-`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
-`catalog.properties.glue.skip-archive` — skip archiving older table versions
-`catalog.properties.glue.skip-name-validation` — skip Glue name validation
-`catalog.properties.client.region` — AWS region for the Glue client
-Register new options in `IcebergDataSinkFactory`
- Add unit tests for Glue catalog DataSink creation
 
## Usage Example
 
```yaml
sink:
type: iceberg
catalog.properties.type: glue
catalog.properties.warehouse: s3://my-bucket/warehouse/
catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
catalog.properties.client.region: us-east-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to