herbherbherb opened a new issue, #15315:
URL: https://github.com/apache/iceberg/issues/15315

   ### Feature Request / Improvement
   
   ### Query Engine
   Flink
   
   ### Feature Request / Improvement
   The Flink Sink implementation (`IcebergSink` and related classes) currently 
uses package-private visibility for most internal classes, constructors, and 
accessors. This makes it impractical for downstream connector implementations 
to compose custom sink pipelines that build on Iceberg's existing 
infrastructure.
   
   ### Use Case / Motivation
   Downstream Flink integrations often need to:
    1. **Add custom metadata to committables** — e.g. watermark information, 
lineage data, or
       application-specific tracking that flows from writers through to 
committers. Currently there is no
       extension point on `IcebergCommittable` for this.
   
    2. **Compose custom sink topologies** — e.g. wrapping `IcebergSinkWriter` 
or `IcebergCommitter`
       with custom metrics, logging, or error handling. Even composition 
(delegation/wrapping) requires
       being able to *reference* these types, which is impossible when they are 
package-private.
   
    3. **Extend `IcebergSink`** to override `createWriter()` or 
`createCommitter()` with custom
       implementations while reusing the base builder logic and data 
distribution.
   
    Without these extension points, downstream projects must copy large 
portions of the sink code,
    creating maintenance burden and version skew.
   
   ### Proposed Changes
    **Part 1: CommittableMetadata framework** (new composition-based extension 
point)
    - `CommittableMetadata` — marker interface for custom metadata on 
committables
    - `CommittableMetadataSerializer` — serializer interface for metadata
    - `CommittableMetadataRegistry` — global registry allowing downstream to 
register a serializer
    - `IcebergCommittable` — adds optional `@Nullable metadata` field 
(backward-compatible: existing
      constructors chain with `metadata = null`)
    - `IcebergCommittableSerializer` — writes boolean flag + delegates to 
registered serializer
   
    This is a pure composition pattern — downstream registers a serializer, 
attaches metadata via the
    existing constructors, and the pipeline carries it through transparently.
   
    **Part 2: Access modifier changes** (enabling downstream composition)
    Widen visibility of key sink pipeline classes (`IcebergCommittable`, 
`IcebergCommitter`,
    `IcebergSinkWriter`, `IcebergWriteAggregator`, etc.) from package-private 
to public, and add
    protected accessors on `IcebergSink` so that downstream implementations can 
reference, wrap, or
    extend these types.
   
    All changes are additive. No existing behavior changes. No new dependencies.
   
   ### Query engine
   
   Flink
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to