I think we should go with something like

List<GlobalCommitT> filterRecoveredCommittables(List<>)

to keep things simple. This should also be easy to do from the framework side and then the sink doesn't need to do any custom state handling.

Best,
Aljoscha

On 22.09.20 16:03, Steven Wu wrote:
Previous APIs discussed have been trying to do more in the framework. If we
take a different approach to a lighter framework, these sets of
minimal APIs are probably good enough. Sink can handle the bookkeeping,
merge, retry logics.

/**
  * CommT is the DataFile in Iceberg
  * GlobalCommT is the checkpoint data type, like ManifestFile in Iceberg
*/
interface GlobalCommitter<CommT, GlobalCommT> {

   void collect(CommT);

   void commit();

   List<GlobalCommT> snapshotState();

   // this is just a callback to sink so that it can perform filter and
retain the uncommitted GlobalCommT in the internal bookkeeping
   void recoveredCommittables(List<GlobalCommT>) ;
}

The most important need from the framework is to run GlobalCommitter in the
jobmanager. It involves the topology creation, checkpoint handling,
serializing the executions of commit() calls etc.

Thanks,
Steven

On Tue, Sep 22, 2020 at 6:39 AM Steven Wu <stevenz...@gmail.com> wrote:

It is fine to leave the CommitResult/RETRY outside the scope of framework.
Then the framework might need to provide some hooks in the
checkpoint/restore logic. because the commit happened in the post
checkpoint completion step, sink needs to update the internal state when
the commit is successful so that the next checkpoint won't include the
committed GlobalCommT.

Maybe GlobalCommitter can have an API like this?
List<GlobalCommT> snapshotState();

But then we still need the recover API if we don't let sink directly
manage the state.
List<GlobalCommT> recoverCommittables(List<GlobalCommT>)

Thanks,
Steven

On Tue, Sep 22, 2020 at 6:33 AM Aljoscha Krettek <aljos...@apache.org>
wrote:

On 22.09.20 13:26, Guowei Ma wrote:
Actually I am not sure adding `isAvailable` is enough. Maybe it is not.
But for the initial version I hope we could make the sink api sync
because
there is already a lot of stuff that has to finish. :--)

I agree, for the first version we should stick to a simpler synchronous
interface.

Aljoscha




Reply via email to