Hi Team,

As discussed on yesterday's community sync, I am working on adding a
possibility to the Flink Iceberg connector to run maintenance tasks on the
Iceberg tables. This will fix the small files issues and in the long run
help compacting the high number of positional and equality deletes created
by Flink tasks writing CDC data to Iceberg tables without the need of Spark
in the infrastructure.

I did some planning, prototyping and currently trying out the solution on a
larger scale.

I put together a document how my current solution looks like:
https://docs.google.com/document/d/16g3vR18mVBy8jbFaLjf2JwAANuYOmIwr15yDDxovdnA/edit?usp=sharing

I would love to hear your thoughts and feedback on this to find a good
final solution.

Thanks,
Peter

Reply via email to