herbherbherb opened a new issue, #15315:
URL: https://github.com/apache/iceberg/issues/15315
### Feature Request / Improvement
### Query Engine
Flink
### Feature Request / Improvement
The Flink Sink implementation (`IcebergSink` and related classes) currently
uses package-private visibility for most internal classes, constructors, and
accessors. This makes it impractical for downstream connector implementations
to compose custom sink pipelines that build on Iceberg's existing
infrastructure.
### Use Case / Motivation
Downstream Flink integrations often need to:
1. **Add custom metadata to committables** — e.g. watermark information,
lineage data, or
application-specific tracking that flows from writers through to
committers. Currently there is no
extension point on `IcebergCommittable` for this.
2. **Compose custom sink topologies** — e.g. wrapping `IcebergSinkWriter`
or `IcebergCommitter`
with custom metrics, logging, or error handling. Even composition
(delegation/wrapping) requires
being able to *reference* these types, which is impossible when they are
package-private.
3. **Extend `IcebergSink`** to override `createWriter()` or
`createCommitter()` with custom
implementations while reusing the base builder logic and data
distribution.
Without these extension points, downstream projects must copy large
portions of the sink code,
creating maintenance burden and version skew.
### Proposed Changes
**Part 1: CommittableMetadata framework** (new composition-based extension
point)
- `CommittableMetadata` — marker interface for custom metadata on
committables
- `CommittableMetadataSerializer` — serializer interface for metadata
- `CommittableMetadataRegistry` — global registry allowing downstream to
register a serializer
- `IcebergCommittable` — adds optional `@Nullable metadata` field
(backward-compatible: existing
constructors chain with `metadata = null`)
- `IcebergCommittableSerializer` — writes boolean flag + delegates to
registered serializer
This is a pure composition pattern — downstream registers a serializer,
attaches metadata via the
existing constructors, and the pipeline carries it through transparently.
**Part 2: Access modifier changes** (enabling downstream composition)
Widen visibility of key sink pipeline classes (`IcebergCommittable`,
`IcebergCommitter`,
`IcebergSinkWriter`, `IcebergWriteAggregator`, etc.) from package-private
to public, and add
protected accessors on `IcebergSink` so that downstream implementations can
reference, wrap, or
extend these types.
All changes are additive. No existing behavior changes. No new dependencies.
### Query engine
Flink
### Willingness to contribute
- [x] I can contribute this improvement/feature independently
- [x] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]