[ 
https://issues.apache.org/jira/browse/FLINK-39245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-39245:
-------------------------------
    Priority: Not a Priority  (was: Minor)

> Support AWS Glue Catalog for Iceberg pipeline connector
> -------------------------------------------------------
>
>                 Key: FLINK-39245
>                 URL: https://issues.apache.org/jira/browse/FLINK-39245
>             Project: Flink
>          Issue Type: New Feature
>          Components: Flink CDC
>         Environment: The Iceberg AWS bundle (iceberg-aws-bundle or 
> equivalent) and AWS SDK must be available in the runtime classpath. On Amazon 
> EMR, the bundled iceberg-flink-runtime already includes Glue Catalog support. 
> For non-EMR environments, users need to add iceberg-aws-bundle-<version>.jar 
> to the Flink lib/ directory.
>            Reporter: Xiao Huang
>            Priority: Not a Priority
>
> Motivation
>  
> Currently, the Iceberg pipeline connector only supports *hadoop* and *hive* 
> catalog types. AWS Glue Data Catalog is widely used as the metastore for 
> Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink 
> deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg 
> tables managed by Glue Catalog are unable to do so with the current 
> implementation.
>  
> Since Iceberg's _CatalogUtil.buildIcebergCatalog()_ already natively supports 
> _type=glue_ (mapping to {_}org.apache.iceberg.aws.glue.GlueCatalog{_}), the 
> Flink CDC Iceberg connector just needs to:
> 1. Add _iceberg-aws_ as a compile-time dependency
> 2. Expose the Glue-related configuration options through the pipeline config 
> layer
> 3. Ensure the catalog properties are correctly passed through
>  
> Proposed Changes
>  
> Add _iceberg-aws_ dependency (provided scope) to 
> flink-cdc-pipeline-connector-iceberg
> Add new configuration options in `IcebergDataSinkOptions`:
> -`catalog.properties.type` — extend description to include `glue`
> -`catalog.properties.catalog-impl` — custom catalog implementation class
> -`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
> -`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
> -`catalog.properties.glue.skip-archive` — skip archiving older table versions
> -`catalog.properties.glue.skip-name-validation` — skip Glue name validation
> -`catalog.properties.client.region` — AWS region for the Glue client
> -Register new options in `IcebergDataSinkFactory`
> Add unit tests for Glue catalog DataSink creation
>  
> Usage Example
>  
> {code:java}
> sink:
> type: iceberg
> catalog.properties.type: glue
> catalog.properties.warehouse: s3://my-bucket/warehouse/
> catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
> catalog.properties.client.region: us-east-1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to