[
https://issues.apache.org/jira/browse/BEAM-5264?focusedWorklogId=140759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-140759
]
ASF GitHub Bot logged work on BEAM-5264:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Sep/18 06:56
Start Date: 04/Sep/18 06:56
Worklog Time Spent: 10m
Work Description: charlesccychen commented on a change in pull request
#6304: [BEAM-5264] Reference DirectRunner implementation of Python User State
and Timers API
URL: https://github.com/apache/beam/pull/6304#discussion_r214797304
##########
File path: sdks/python/apache_beam/transforms/userstate.py
##########
@@ -161,3 +176,126 @@ def validate_stateful_dofn(dofn):
('The on_timer callback for %s is not the specified .%s method '
'for DoFn %r (perhaps it was overwritten?).') % (
timer_spec, method_name, dofn))
+
+
+class RuntimeTimer(object):
+ """Timer interface object passed to user code."""
+
+ def __init__(self, timer_spec):
+ self._cleared = False
+ self._new_timestamp = None
+
+ def clear(self):
+ self._cleared = True
+ self._new_timestamp = None
+
+ def set(self, timestamp):
+ self._new_timestamp = timestamp
+
+
+class RuntimeState(object):
+ """State interface object passed to user code."""
+
+ def __init__(self, state_spec, state_tag, current_value_accessor):
+ self._state_spec = state_spec
+ self._state_tag = state_tag
+ self._current_value_accessor = current_value_accessor
+
+ @staticmethod
+ def for_spec(state_spec, state_tag, current_value_accessor):
+ if isinstance(state_spec, BagStateSpec):
+ return BagRuntimeState(state_spec, state_tag, current_value_accessor)
+ elif isinstance(state_spec, CombiningValueStateSpec):
+ return CombiningValueRuntimeState(state_spec, state_tag,
+ current_value_accessor)
+ else:
+ raise ValueError('Invalid state spec: %s' % state_spec)
+
+ def _encode(self, value):
+ return self._state_spec.coder.encode(value)
+
+ def _decode(self, value):
+ return self._state_spec.coder.decode(value)
+
+ def prefetch(self):
+ # The default implementation here does nothing.
+ pass
+
+
+# Sentinel designating an unread value.
+UNREAD_VALUE = object()
+
+
+class BagRuntimeState(RuntimeState):
+ """Bag state interface object passed to user code."""
+
+ def __init__(self, state_spec, state_tag, current_value_accessor):
+ super(BagRuntimeState, self).__init__(
+ state_spec, state_tag, current_value_accessor)
+ self._cached_value = UNREAD_VALUE
+ self._cleared = False
+ self._new_values = []
+
+ def read(self):
+ if self._cached_value is UNREAD_VALUE:
+ self._cached_value = self._current_value_accessor()
+ if not self._cleared:
+ encoded_values = itertools.chain(self._cached_value, self._new_values)
+ else:
+ encoded_values = self._new_values
+ return (self._decode(v) for v in encoded_values)
+
+ def add(self, value):
+ self._new_values.append(self._encode(value))
+
+ def clear(self):
+ self._cleared = True
+ self._new_values = []
+
+
+class CombiningValueRuntimeState(RuntimeState):
Review comment:
I think that is a good idea, to reduce the number of primitives in the FnAPI
/ portability logic (CC: @lukecwik). Do you remember why in Dataflow combining
was its own type of state?--was there some combining happening at the shuffle
level? In some sense though, this is the "user" portion of the code, in that
any state the FnAPI exposes could be used here, and other runners need only
implement the BagState tag type. It's not clear to me how we can cleanly make
this more "user-level"--you can imagine having the user, instead of declaring a
combining value state, declaring a bag state instead, and when they receive the
runtime state object, wrapping it with something that shims it to look like a
combining value state. But this is much more confusing so I don't see it as
preferable to the current approach of having this be an explicit state spec
type and making the wrapping of bag state happen at this level.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 140759)
Time Spent: 2h 10m (was: 2h)
> Reference DirectRunner implementation of Python user state and timers API
> -------------------------------------------------------------------------
>
> Key: BEAM-5264
> URL: https://issues.apache.org/jira/browse/BEAM-5264
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.6.0
> Reporter: Charles Chen
> Assignee: Charles Chen
> Priority: Major
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> This issue tracks the reference DirectRunner implementation of the Beam
> Python User State and Timer API, described here:
> [https://s.apache.org/beam-python-user-state-and-timers].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)