I just realized that the HBaseIO example is not a good one because we can
already have Watch like behavior as we do for Partition discovery in
HCatalogIO.
Still I am interested on your views on bounded/unbounded unification.

Interesting question2: How this will annotations connect with the Watch
transform Polling patterns?
https://github.com/apache/beam/blob/650e6cd9c707472c34055382a1356cf22d14ee5e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L178


On Fri, Feb 28, 2020 at 10:47 AM Ismaël Mejía <[email protected]> wrote:

> Really interesting! Implementing correctly the watermark has been a common
> struggle for IO authors, to the point that some IOs still have issues
> around
> that. So +1 for this, in particular if we can get to reuse common patterns.
> I was not aware of Boyuan's work around this, really nice.
>
> One aspect I have always being confused about since I read the SDF proposal
> documents is if we could get to have a single API for both Bounded and
> Unbounded
> IO by somehow assuming that with a BoundedSDF is an UnboundedSDF special
> case.
> Could WatermarkEstimator help in this direction?
>
> One quick case that I can think is to make the current HBaseIO SDF to work
> in an
> unbounded manner, for example to 'watch and read new tables'.
>
>
> On Thu, Feb 27, 2020 at 11:43 PM Luke Cwik <[email protected]> wrote:
>
>> See this doc[1] and blog[2] for some context about SplittableDoFns.
>>
>> To support watermark reporting within the Java SDK for SplittableDoFns,
>> we need a way to have SDF authors to report watermark estimates over the
>> element and restriction pair that they are processing.
>>
>> For UnboundedSources, it was found to be a pain point to ask each SDF
>> author to write their own watermark estimation which typically prevented
>> re-use. Therefore we would like to have a "library" of watermark estimators
>> that help SDF authors perform this estimation similar to how there is a
>> "library" of restrictions and restriction trackers that SDF authors can
>> use. For SDF authors where the existing library doesn't work, they can add
>> additional ones that observe timestamps of elements or choose to directly
>> report the watermark through a "ManualWatermarkEstimator" parameter that
>> can be supplied to @ProcessElement methods.
>>
>> The public facing portion of the DoFn changes adds three new annotations
>> for new DoFn style methods:
>> GetInitialWatermarkEstimatorState: Returns the initial watermark state,
>> similar to GetInitialRestriction
>> GetWatermarkEstimatorStateCoder: Returns a coder compatible with
>> watermark state type, similar to GetRestrictionCoder for restrictions
>> returned by GetInitialRestriction.
>> NewWatermarkEstimator: Returns a watermark estimator that either the
>> framework invokes allowing it to observe the timestamps of output records
>> or a manual watermark estimator that can be explicitly invoked to update
>> the watermark.
>>
>> See [3] for an initial PR with the public facing additions to the core
>> Java API related to SplittableDoFn.
>>
>> This mirrors a bunch of work that was done by Boyuan within the Pyhon SDK
>> [4, 5] but in the style of new DoFn parameter/method invocation we have in
>> the Java SDK.
>>
>> 1: https://s.apache.org/splittable-do-fn
>> 2: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
>> 3: https://github.com/apache/beam/pull/10992
>> 4: https://github.com/apache/beam/pull/9794
>> 5: https://github.com/apache/beam/pull/10375
>>
>

Reply via email to