[ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=384695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384695
 ]

ASF GitHub Bot logged work on BEAM-8537:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Feb/20 20:06
            Start Date: 10/Feb/20 20:06
    Worklog Time Spent: 10m 
      Work Description: chadrik commented on pull request #10802: [BEAM-8537] 
Move wrappers of RestrictionTracker out of iobase
URL: https://github.com/apache/beam/pull/10802#discussion_r377288081
 
 

 ##########
 File path: sdks/python/apache_beam/runners/sdf_utils.py
 ##########
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""Common utility class to help SDK harness to execute an SDF. """
+
+from __future__ import absolute_import
+from __future__ import division
+
+import logging
+import threading
+from builtins import object
+from collections import namedtuple
+from typing import TYPE_CHECKING
+from typing import Any
+from typing import Optional
+from typing import Tuple
+
+from apache_beam.utils import timestamp
+
+if TYPE_CHECKING:
+  from apache_beam.io.iobase import RestrictionTracker
+  from apache_beam.utils.timestamp import Timestamp
+
+_LOGGER = logging.getLogger(__name__)
+
+
+SplitResultPrimary = namedtuple(
+    'SplitResultPrimary', 'windowed_value')
+
+SplitResultResidual = namedtuple(
+    'SplitResultResidual',
+    'windowed_value current_watermark deferred_timestamp')
+
+class ThreadsafeRestrictionTracker(object):
+  """A thread-safe wrapper which wraps a `RestritionTracker`.
+
+  This wrapper guarantees synchronization of modifying restrictions across
+  multi-thread.
+  """
+  def __init__(self, restriction_tracker):
+    # type: (RestrictionTracker) -> None
+    from apache_beam.io.iobase import RestrictionTracker
+    if not isinstance(restriction_tracker, RestrictionTracker):
+      raise ValueError(
+          'Initialize ThreadsafeRestrictionTracker requires'
+          'RestrictionTracker.')
+    self._restriction_tracker = restriction_tracker
+    # Records an absolute timestamp when defer_remainder is called.
+    self._deferred_timestamp = None
+    self._lock = threading.RLock()
+    self._deferred_residual = None
+    self._deferred_watermark = None
+
+  def current_restriction(self):
+    with self._lock:
+      return self._restriction_tracker.current_restriction()
+
+  def try_claim(self, position):
+    with self._lock:
+      return self._restriction_tracker.try_claim(position)
+
+  def defer_remainder(self, deferred_time=None):
+    """Performs self-checkpoint on current processing restriction with an
+    expected resuming time.
+
+    Self-checkpoint could happen during processing elements. When executing an
+    DoFn.process(), you may want to stop processing an element and resuming
+    later if current element has been processed quit a long time or you also
+    want to have some outputs from other elements. ``defer_remainder()`` can be
+    called on per element if needed.
+
+    Args:
+      deferred_time: A relative ``timestamp.Duration`` that indicates the ideal
+      time gap between now and resuming, or an absolute ``timestamp.Timestamp``
+      for resuming execution time. If the time_delay is None, the deferred work
+      will be executed as soon as possible.
+    """
+
+    # Record current time for calculating deferred_time later.
+    self._deferred_timestamp = timestamp.Timestamp.now()
+    if (deferred_time and not isinstance(deferred_time, timestamp.Duration) and
+        not isinstance(deferred_time, timestamp.Timestamp)):
+      raise ValueError(
+          'The timestamp of deter_remainder() should be a '
+          'Duration or a Timestamp, or None.')
+    self._deferred_watermark = deferred_time
+    checkpoint = self.try_split(0)
+    if checkpoint:
+      _, self._deferred_residual = checkpoint
+
+  def check_done(self):
+    with self._lock:
+      return self._restriction_tracker.check_done()
+
+  def current_progress(self):
+    with self._lock:
+      return self._restriction_tracker.current_progress()
+
+  def try_split(self, fraction_of_remainder):
+    with self._lock:
+      return self._restriction_tracker.try_split(fraction_of_remainder)
+
+  def deferred_status(self):
+    # type: () -> Optional[Tuple[Any, Timestamp]]
+
+    """Returns deferred work which is produced by ``defer_remainder()``.
 
 Review comment:
   I just realized I was wrong about this.  I guess this is something that 
became standard after the yapf change. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 384695)
    Time Spent: 12h 10m  (was: 12h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-8537
>                 URL: https://issues.apache.org/jira/browse/BEAM-8537
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core, sdk-py-harness
>            Reporter: Boyuan Zhang
>            Assignee: Boyuan Zhang
>            Priority: Major
>          Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to