Thanks for the response! The outline is very helpful. I'll use this as a
starting point.

-
Lakshmi

On Fri, Nov 15, 2019 at 12:49 PM Ryan Blue <[email protected]>
wrote:

> Hi Lakshmi,
>
> +Steven Wu <[email protected]>, who wrote our Flink sink.
>
> I can tell you a bit about how our sink works, which will hopefully help.
> Ours accumulates data in data files until a snapshot. When that happens,
> each writer closes its open data files and sends a DataFile instance to the
> next stage, which is a single commit task responsible for committing all
> the data files to the Iceberg table. When the commit task gets the
> notification from each writer, it prepares a commit by writing a new
> manifest of all the data files. Then it stages that commit information in
> the checkpoint. When the checkpoint succeeds, the committer commits to the
> Iceberg table. If the Iceberg commit fails, the new manifests will stack up
> and no data is lost. When the committer is running the Iceberg commit, it
> checks what previous checkpoints have already been committed in recent
> Iceberg snapshots (using an ID from each flink snapshot stored in the
> Iceberg summary) for exactly-once commits.
>
> Steven can probably explain it better, but that's a rough outline.
>
> rb
>
> On Fri, Nov 15, 2019 at 11:53 AM Lakshmi Rao <[email protected]> wrote:
>
>> Hi,
>>
>> I'm working on building a POC of streaming data with Flink to Iceberg for
>> a hackathon project. I know this issue is still open
>> https://github.com/apache/incubator-iceberg/issues/567 . I'm pretty
>> excited for the work mentioned in the issue to be open sourced and would be
>> happy to contribute to any tasks or tickets related to this issue!
>>
>> However, in the meantime, I'd to get a simple working version for a
>> flink-iceberg sink and generally explore Iceberg more. Any pointers on how
>> to get started? I saw this PR that enabled sinking to Iceberg with spark
>> structured streaming:
>> https://github.com/apache/incubator-iceberg/pull/228 Are there any other
>> pointers the community can provide?
>>
>> Thanks
>> Lakshmi
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Lakshmi
Graduate Student
University of Illinois Urbana-Champaign

Reply via email to