Thanks for the response! The outline is very helpful. I'll use this as a starting point.
- Lakshmi On Fri, Nov 15, 2019 at 12:49 PM Ryan Blue <[email protected]> wrote: > Hi Lakshmi, > > +Steven Wu <[email protected]>, who wrote our Flink sink. > > I can tell you a bit about how our sink works, which will hopefully help. > Ours accumulates data in data files until a snapshot. When that happens, > each writer closes its open data files and sends a DataFile instance to the > next stage, which is a single commit task responsible for committing all > the data files to the Iceberg table. When the commit task gets the > notification from each writer, it prepares a commit by writing a new > manifest of all the data files. Then it stages that commit information in > the checkpoint. When the checkpoint succeeds, the committer commits to the > Iceberg table. If the Iceberg commit fails, the new manifests will stack up > and no data is lost. When the committer is running the Iceberg commit, it > checks what previous checkpoints have already been committed in recent > Iceberg snapshots (using an ID from each flink snapshot stored in the > Iceberg summary) for exactly-once commits. > > Steven can probably explain it better, but that's a rough outline. > > rb > > On Fri, Nov 15, 2019 at 11:53 AM Lakshmi Rao <[email protected]> wrote: > >> Hi, >> >> I'm working on building a POC of streaming data with Flink to Iceberg for >> a hackathon project. I know this issue is still open >> https://github.com/apache/incubator-iceberg/issues/567 . I'm pretty >> excited for the work mentioned in the issue to be open sourced and would be >> happy to contribute to any tasks or tickets related to this issue! >> >> However, in the meantime, I'd to get a simple working version for a >> flink-iceberg sink and generally explore Iceberg more. Any pointers on how >> to get started? I saw this PR that enabled sinking to Iceberg with spark >> structured streaming: >> https://github.com/apache/incubator-iceberg/pull/228 Are there any other >> pointers the community can provide? >> >> Thanks >> Lakshmi >> > > > -- > Ryan Blue > Software Engineer > Netflix > -- Lakshmi Graduate Student University of Illinois Urbana-Champaign
