Previous APIs discussed have been trying to do more in the framework. If we
take a different approach to a lighter framework, these sets of
minimal APIs are probably good enough. Sink can handle the bookkeeping,
merge, retry logics.

/**
 * CommT is the DataFile in Iceberg
 * GlobalCommT is the checkpoint data type, like ManifestFile in Iceberg
*/
interface GlobalCommitter<CommT, GlobalCommT> {

  void collect(CommT);

  void commit();

  List<GlobalCommT> snapshotState();

  // this is just a callback to sink so that it can perform filter and
retain the uncommitted GlobalCommT in the internal bookkeeping
  void recoveredCommittables(List<GlobalCommT>) ;
}

The most important need from the framework is to run GlobalCommitter in the
jobmanager. It involves the topology creation, checkpoint handling,
serializing the executions of commit() calls etc.

Thanks,
Steven

On Tue, Sep 22, 2020 at 6:39 AM Steven Wu <stevenz...@gmail.com> wrote:

> It is fine to leave the CommitResult/RETRY outside the scope of framework.
> Then the framework might need to provide some hooks in the
> checkpoint/restore logic. because the commit happened in the post
> checkpoint completion step, sink needs to update the internal state when
> the commit is successful so that the next checkpoint won't include the
> committed GlobalCommT.
>
> Maybe GlobalCommitter can have an API like this?
> > List<GlobalCommT> snapshotState();
>
> But then we still need the recover API if we don't let sink directly
> manage the state.
> > List<GlobalCommT> recoverCommittables(List<GlobalCommT>)
>
> Thanks,
> Steven
>
> On Tue, Sep 22, 2020 at 6:33 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> On 22.09.20 13:26, Guowei Ma wrote:
>> > Actually I am not sure adding `isAvailable` is enough. Maybe it is not.
>> > But for the initial version I hope we could make the sink api sync
>> because
>> > there is already a lot of stuff that has to finish. :--)
>>
>> I agree, for the first version we should stick to a simpler synchronous
>> interface.
>>
>> Aljoscha
>>
>

Reply via email to