[
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=371219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-371219
]
ASF GitHub Bot logged work on BEAM-8537:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jan/20 23:59
Start Date: 13/Jan/20 23:59
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #10375: [BEAM-8537]
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r366086655
##########
File path: sdks/python/apache_beam/io/iobase.py
##########
@@ -1236,127 +1233,38 @@ def try_claim(self, position):
raise NotImplementedError
-class ThreadsafeRestrictionTracker(object):
- """A thread-safe wrapper which wraps a `RestritionTracker`.
+class WatermarkEstimator(object):
+ """A WatermarkEstimator which is used for estimating output_watermark based
on
+ the timestamp of output records or manual modifications.
- This wrapper guarantees synchronization of modifying restrictions across
- multi-thread.
- """
-
- def __init__(self, restriction_tracker):
- if not isinstance(restriction_tracker, RestrictionTracker):
- raise ValueError(
- 'Initialize ThreadsafeRestrictionTracker requires'
- 'RestrictionTracker.')
- self._restriction_tracker = restriction_tracker
- # Records an absolute timestamp when defer_remainder is called.
- self._deferred_timestamp = None
- self._lock = threading.RLock()
- self._deferred_residual = None
- self._deferred_watermark = None
-
- def current_restriction(self):
- with self._lock:
- return self._restriction_tracker.current_restriction()
-
- def try_claim(self, position):
- with self._lock:
- return self._restriction_tracker.try_claim(position)
-
- def defer_remainder(self, deferred_time=None):
- """Performs self-checkpoint on current processing restriction with an
- expected resuming time.
+ The base class provides common APIs that are called by the framework, which
+ are also accessible inside a DoFn.process() body. Derived watermark estimator
+ should implement all APIs listed below. Additional methods can be implemented
+ and will be available when invoked within a DoFn.
- Self-checkpoint could happen during processing elements. When executing an
- DoFn.process(), you may want to stop processing an element and resuming
- later if current element has been processed quit a long time or you also
- want to have some outputs from other elements. ``defer_remainder()`` can be
- called on per element if needed.
-
- Args:
- deferred_time: A relative ``timestamp.Duration`` that indicates the ideal
- time gap between now and resuming, or an absolute ``timestamp.Timestamp``
- for resuming execution time. If the time_delay is None, the deferred work
- will be executed as soon as possible.
+ Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+ """
+ def get_estimator_state(self):
+ """Get current state of the WatermarkEstimator instance, which can be used
+ to recreate the WatermarkEstimator when processing the restriction. See
+ WatermarkEstimatorProvider.create_watermark_estimator.
"""
+ raise NotImplementedError(type(self))
- # Record current time for calculating deferred_time later.
- self._deferred_timestamp = timestamp.Timestamp.now()
- if (deferred_time and
- not isinstance(deferred_time, timestamp.Duration) and
- not isinstance(deferred_time, timestamp.Timestamp)):
- raise ValueError('The timestamp of deter_remainder() should be a '
- 'Duration or a Timestamp, or None.')
- self._deferred_watermark = deferred_time
- checkpoint = self.try_split(0)
- if checkpoint:
- _, self._deferred_residual = checkpoint
-
- def check_done(self):
- with self._lock:
- return self._restriction_tracker.check_done()
-
- def current_progress(self):
- with self._lock:
- return self._restriction_tracker.current_progress()
-
- def try_split(self, fraction_of_remainder):
- with self._lock:
- return self._restriction_tracker.try_split(fraction_of_remainder)
+ def current_watermark(self):
+ """Return estimated output_watermark. This function must return
+ monotonically increasing watermark."""
Review comment:
```suggestion
monotonically increasing watermarks."""
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 371219)
Time Spent: 3h 50m (was: 3h 40m)
> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> ----------------------------------------------------------------------------
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core, sdk-py-harness
> Reporter: Boyuan Zhang
> Assignee: Boyuan Zhang
> Priority: Major
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to
> create a custom WatermarkEstimator per windowed value. It should be similar
> to how we track restriction for SDF:
> WatermarkEstimator <---> RestrictionTracker
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam
--
This message was sent by Atlassian Jira
(v8.3.4#803005)