Copilot commented on code in PR #4314:
URL: https://github.com/apache/flink-cdc/pull/4314#discussion_r2957513998
##########
docs/content/docs/connectors/pipeline-connectors/iceberg.md:
##########
@@ -63,20 +65,60 @@ pipeline:
parallelism: 2
```
+### AWS Glue Catalog Example
+
+```yaml
+source:
+ type: mysql
+ name: MySQL Source
+ hostname: 127.0.0.1
+ port: 3306
+ username: admin
+ password: pass
+ tables: adb.\.*, bdb.user_table_[0-9]+, [app|web].order_\.*
+ server-id: 5401-5404
+
+sink:
+ type: iceberg
+ name: Iceberg Sink
+ catalog.properties.type: glue
+ catalog.properties.warehouse: s3://my-bucket/warehouse
+ catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
+ catalog.properties.client.region: us-east-1
+ catalog.properties.glue.skip-archive: true
+
+pipeline:
+ name: MySQL to Iceberg via Glue Pipeline
+ parallelism: 2
+```
+
***Note:***
-If `catalog.properties.type` is hadoop, you need to configure the following
dependencies manually, and pass it with `--jar` argument of Flink CDC CLI when
submitting YAML pipeline jobs.
+Depending on the catalog type, you may need to add extra JARs manually and
pass them with the `--jar` argument of Flink CDC CLI when submitting YAML
pipeline jobs.
+
<div class="wy-table-responsive">
<table class="colwidths-auto docutils">
<thead>
<tr>
+ <th class="text-left">Catalog Type</th>
<th class="text-left">Dependency Item</th>
<th class="text-left">Description</th>
</tr>
</thead>
<tbody>
<tr>
- <td><a
href="https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber/2.8.3-10.0
"> org.apache.flink:flink-shaded-hadoop-2-uber:2.8.3-10.0</a></td>
- <td>Used for Hadoop dependencies.</td>
+ <td>hadoop</td>
+ <td><a
href="https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber/2.8.3-10.0">org.apache.flink:flink-shaded-hadoop-2-uber:2.8.3-10.0</a></td>
+ <td>Provides Hadoop filesystem dependencies.</td>
+ </tr>
+ <tr>
+ <td>glue</td>
+ <td><a
href="https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws">org.apache.iceberg:iceberg-aws</a></td>
+ <td>Provides AWS Glue Catalog and S3 FileIO implementation.</td>
Review Comment:
The dependency note/table doesn’t mention that the connector itself no
longer bundles Iceberg runtime classes (the module now declares
`iceberg-flink-runtime-1.20` as `provided` and removes shading). Users running
outside environments that pre-install Iceberg will hit `NoClassDefFoundError`
unless they also add the Iceberg runtime JAR(s) to the job/cluster classpath.
Please update this dependency table/note to explicitly include
`org.apache.iceberg:iceberg-flink-runtime-1.20` (and clarify it’s required for
all catalog types when not present in the runtime).
##########
docs/content/docs/connectors/pipeline-connectors/iceberg.md:
##########
@@ -115,21 +157,63 @@ Pipeline Connector Options
<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
- <td>Metastore of Iceberg catalog, supports <code>hadoop</code> and
<code>hive</code>.</td>
+ <td>Metastore type of Iceberg catalog, supports <code>hadoop</code>,
<code>hive</code>, and <code>glue</code>. Alternatively, you can use
<code>catalog.properties.catalog-impl</code> to specify a custom catalog class
directly.</td>
Review Comment:
In the options table, `catalog.properties.type` is marked as **required**,
but the description says users can alternatively set
`catalog.properties.catalog-impl` instead. Please adjust the “Required” column
and/or wording to reflect the actual constraint (e.g., “either
`catalog.properties.type` or `catalog.properties.catalog-impl` must be set”).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]