Xiao Huang created FLINK-39245:
----------------------------------
Summary: Support AWS Glue Catalog for Iceberg pipeline connector
Key: FLINK-39245
URL: https://issues.apache.org/jira/browse/FLINK-39245
Project: Flink
Issue Type: New Feature
Components: Flink CDC
Environment: The Iceberg AWS bundle (iceberg-aws-bundle or equivalent)
and AWS SDK must be available in the runtime classpath. On Amazon EMR, the
bundled iceberg-flink-runtime already includes Glue Catalog support. For
non-EMR environments, users need to add iceberg-aws-bundle-<version>.jar to the
Flink lib/ directory.
Reporter: Xiao Huang
## Motivation
Currently, the Iceberg pipeline connector only supports `hadoop` and `hive`
catalog types. AWS Glue Data Catalog is widely used as the metastore for
Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink
deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg
tables managed by Glue Catalog are unable to do so with the current
implementation.
Since Iceberg's `CatalogUtil.buildIcebergCatalog()` already natively supports
`type=glue` (mapping to `org.apache.iceberg.aws.glue.GlueCatalog`), the Flink
CDC Iceberg connector just needs to:
1. Add `iceberg-aws` as a compile-time dependency
2. Expose the Glue-related configuration options through the pipeline config
layer
3. Ensure the catalog properties are correctly passed through
## Proposed Changes
-Add `iceberg-aws` dependency (provided scope) to
`flink-cdc-pipeline-connector-iceberg`
- Add new configuration options in `IcebergDataSinkOptions`:
-`catalog.properties.type` — extend description to include `glue`
-`catalog.properties.catalog-impl` — custom catalog implementation class
-`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
-`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
-`catalog.properties.glue.skip-archive` — skip archiving older table versions
-`catalog.properties.glue.skip-name-validation` — skip Glue name validation
-`catalog.properties.client.region` — AWS region for the Glue client
-Register new options in `IcebergDataSinkFactory`
- Add unit tests for Glue catalog DataSink creation
## Usage Example
```yaml
sink:
type: iceberg
catalog.properties.type: glue
catalog.properties.warehouse: s3://my-bucket/warehouse/
catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
catalog.properties.client.region: us-east-1
--
This message was sent by Atlassian Jira
(v8.20.10#820010)