[
https://issues.apache.org/jira/browse/BEAM-3818?focusedWorklogId=81598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81598
]
ASF GitHub Bot logged work on BEAM-3818:
----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Mar/18 19:10
Start Date: 17/Mar/18 19:10
Worklog Time Spent: 10m
Work Description: charlesccychen commented on a change in pull request
#4838: [BEAM-3818] Add support for streaming side inputs in the DirectRunner
(part I: update _SideInputsContainer as the watermark advances)
URL: https://github.com/apache/beam/pull/4838#discussion_r175267206
##########
File path: sdks/python/apache_beam/runners/direct/evaluation_context.py
##########
@@ -217,11 +242,13 @@ def handle_result(
self._side_inputs_container.add_values(
view,
committed_bundle.get_elements_iterable(make_copy=True))
- if (self.get_execution_context(result.transform)
- .watermarks.input_watermark
- == WatermarkManager.WATERMARK_POS_INF):
- self._pending_unblocked_tasks.extend(
- self._side_inputs_container.finalize_value_and_get_tasks(view))
+
+ # Tasks generated from unblocked side inputs as the watermark progresses.
+ tasks = self._watermark_manager.update_watermarks(
+ completed_bundle, result.transform, completed_timers,
+ committed_bundles, unprocessed_bundles, result.keyed_watermark_holds,
+ self._side_inputs_container)
+ self._pending_unblocked_tasks.extend(tasks)
Review comment:
I think we need it to be more general than only if the current bundle is a
view. Any watermark update may trigger downstream views to be updated--for
example, say my windowing is Fixed(10), and I have a single element at time
t=5. When the upstream watermark hits t=10, there may not be a bundle
processed by the transform directly going into the view, so we do need this in
the full generality, since the watermark update of the previous step could
update the watermark of the transform emitting the view.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 81598)
Time Spent: 3h (was: 2h 50m)
> Add support for the streaming side inputs in the Python DirectRunner
> --------------------------------------------------------------------
>
> Key: BEAM-3818
> URL: https://issues.apache.org/jira/browse/BEAM-3818
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: María GH
> Assignee: María GH
> Priority: Minor
> Fix For: 3.0.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> The streaming DirectRunner should support streaming side input semantics.
> Currently, side inputs are only available for globally-windowed side input
> PCollections.
> Also, empty side inputs cause a pipeline stall.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)