On Wed, Nov 7, 2018 at 8:33 AM Robert Bradshaw <[email protected]> wrote:

> I think that not returning the users specific subclass should be fine.
> Does the removal of markDone imply that the consumer always knows a
> "final" key to claim on any given restriction?
>

Yes, each restriction needs to support claiming a "final" key that would
make the restriction "done". In the BigTable/HBase case it is the empty key
"", for files it can be a file offset beyond the end of the file.
Generally, restriction trackers written by SDF authors could also take an
instance of an object that they can compare instance equality against for a
final key. Alternatively we could allow restriction trackers to implement
markDone() but would need the SDK have knowledge of the method by having
the RestrictionTracker implement interface, extend abstract base class, or
reflectively found so that we would be able to wrap it to provide
synchronization guarantees. I had toyed with the idea of using something
like the ProxyInvocationHandler that backs PipelineOptions to be able to
provide a modified version of the users instance that had the appropriate
synchronization guarantees but couldn't get it to work.


> On Wed, Nov 7, 2018 at 1:45 AM Lukasz Cwik <[email protected]> wrote:
> >
> > I have started to work on how to change the user facing API within the
> Java SDK to support splitting/checkpointing[1], backlog reporting[2] and
> bundle finalization[3].
> >
> > I have this PR[4] which contains minimal interface/type definitions to
> convey how the API surface would change with these 4 changes:
> > 1) Exposes the ability for @SplitRestriction to take a backlog
> suggestion on how to perform splitting and for how many restrictions should
> be returned.
> > 2) Adds the ability for RestrictionTrackers to report backlog
> > 3) Updates @ProcessElement to be required to take a generic
> RestrictionTracker instead of the users own restriction tracker type.
> > 4) Adds the ability for @StartBundle/@ProcessElement/@FinishBundle to
> register a callback that is invoked after bundle finalization.
> >
> > The details are in the javadoc comments as to how I would expect the
> contract to play out.
> > Feel free to comment on the ML/PR around the contract and after the
> feedback is received/digested/implemented, I would like to get the changes
> submitted so that work can start  towards providing an implementation in
> the Java SDK, Python SDK, and Go SDK and the shared runner portability
> library.
> >
> > I would like to call out special attention to 3 since with this change
> it will enable us to remove the synchronization requirement for users as we
> will wrap the underlying restriction tracker allowing us to add appropriate
> synchronization as needed and also to watch any calls that pass through the
> object such as the claim calls. I also believe this prevents people from
> writing RestrictionTrackers where the contract of tryClaim is subverted
> since markDone is outside the purview of tryClaim as in
> ByteKeyRangeTracker[5].
> >
> > 1: https://s.apache.org/beam-checkpoint-and-split-bundles
> > 2: https://s.apache.org/beam-bundles-backlog-splitting
> > 3: https://s.apache.org/beam-finalizing-bundles
> > 4: https://github.com/apache/beam/pull/6969
> > 5: https://github.com/apache/beam/pull/6949
>

Reply via email to