[
https://issues.apache.org/jira/browse/BEAM-9733?focusedWorklogId=422178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422178
]
ASF GitHub Bot logged work on BEAM-9733:
----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Apr/20 17:36
Start Date: 14/Apr/20 17:36
Worklog Time Spent: 10m
Work Description: mxm commented on issue #11362: [BEAM-9733] Always let
ImpulseSourceFunction emit a final watermark
URL: https://github.com/apache/beam/pull/11362#issuecomment-613579619
> @mxm these are significant changes after the PR was approved. Questions:
I'm aware of that. I've tried to keep the necessary changes at a minimum.
I'm happy to hear your feedback.
> 1. What test coverage do we have for ensuring that watermarks don't bypass
in-progress elements?
We have tests in `ExecutableStageDoFnOperatorTest`
(`outputsAreTaggedCorrectly`, `testEnsureDeferredStateCleanupTimerFiring`) and
through our integration tests. Admittedly, it would be good to dedicate a test
specifically to watermark behavior.
> 2. Do these changes affect how the main input watermark interacts with the
side input watermark?
Effectively, side inputs in portability were broken before because (a) the
side input watermark hold was abused by the the portable operator (b) only
`processWatermark` was overridden but for proper support for side inputs we
have to override `processWatermark1` (1 is the main input when we have side
inputs, `DoFnoperator#processWatermark` calls `processWatermark1` when we do
not have side inputs, but `processWatermark` is not called when we have side
input, only `processWatermark1`).
> 3. Will the added watermark logging affect the usefulness of debug logging
for other investigations (I had in the past removed it after done debugging
issues)
It greatly helped me to debug the current behavior and develop the solution.
I find it immensely helpful and would like to keep it if further debugging is
necessary. It is very useful to have debug information already built-in,
instead of having to add it manually every time (which in any case will be
possible).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 422178)
Time Spent: 3h 10m (was: 3h)
> ImpulseSourceFunction does not emit a final watermark
> -----------------------------------------------------
>
> Key: BEAM-9733
> URL: https://issues.apache.org/jira/browse/BEAM-9733
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Critical
> Fix For: 2.21.0
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> The Flink Runner's {{ImpulseSourceFunction}} does not emit a final watermark,
> unless {{--shutdownSourcesOnFinalWatermark}} flag has been specified (the
> flag is used in tests to shutdown the pipeline after reading all data). Most
> pipelines will be long-running and thus do not specify the flag.
> Not sending out the final watermark causes GroupByKey to hold back the data
> of event time windows until the pipeline is shut down (the final watermark is
> always emitted on pipeline shutdown which is why using the above flag works).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)