Hi,

The Flink Dynamic Iceberg Sink requires a few iceberg-core class
visibility changes, which I'd like to get your feedback on.

Here is the diff:
https://github.com/apache/iceberg/pull/13032/commits/1804b0ac4ff97c3c943463725e91a1e24b0f8c44

There are three visibility changes:

Change 1: Make NamedReference constructor public
Change 2: Make UnboundTransform constructor public
Change 3: Make SchemaUpdate class and constructor public

Change (1) and (2) both fall under the same feature, which is
rebinding PartitionSpec transforms.

Background: The Dynamic Iceberg Sink can dynamically evolve Schema and
PartitionSpec of a table. The user provides schema / spec in the input
record. However, to ensure the changes are safe, we don't directly
take the user schema / spec, but we rebind it to one of the existing
schemas / specs. We do that by name-matching the fields in the
user-provided schema with the fields in the table schemas. This allows
us to directly use or evolve an existing schema and to preserve the
internal field ids. Name-based schema evolution is also what the
Iceberg Schema API offers.

Similarly, user-provided PartitionSpecs need to be rewritten for their
transforms to get bound to the correct source id, which is one of the
field ids of the table schema. The UpdatePartitionSpec API works on
Terms which is what we generate with this code and the class
visibility changes:
https://github.com/apache/iceberg/blob/61fedb11399ee344f9621362521e8a5071e91c05/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/PartitionSpecEvolution.java#L124
In order to rebind the provided PartitionSpec to the id of the
name-matching field in the table schema, we need the constructor of
NamedReference and UnboundTransform to be public.

Change (3) is used to test the schema update code. We want to test the
schema evolution code, which uses SchemaUpdate API to make the
changes. Our tests validate that the applied changes yield the
expected new schema. Example test code:
https://github.com/apache/iceberg/blob/61fedb11399ee344f9621362521e8a5071e91c05/flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestEvolveSchemaVisitor.java#L111

I hope that makes sense. I'm curious to hear what you think. Any concerns?

Cheers,
Max

Reply via email to