[
https://issues.apache.org/jira/browse/BEAM-10670?focusedWorklogId=486467&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-486467
]
ASF GitHub Bot logged work on BEAM-10670:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Sep/20 03:18
Start Date: 19/Sep/20 03:18
Worklog Time Spent: 10m
Work Description: lukecwik edited a comment on pull request #12603:
URL: https://github.com/apache/beam/pull/12603#issuecomment-695156605
@iemejia I figured out that the issue is that watermark holds aren't
implemented for spark so the first batch completes which computes new
watermarks so the watermark hold that was set by the splittable dofn
implementation is ignored. This leads to timers being dropped and hence only
some of the results being produced.
This is also the likely cause for why the PAssert is dropping the elements
that were produced as well but I haven't validated this yet.
Can you explain how the GlobalWatermarkHolder works, can I register anything
as a `sourceId`?
Since watermark holds don't seem to be implemented, does the
GroupAlsoViaWindowSet hold back the watermark for elements that it currently
has buffered?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 486467)
Time Spent: 18h 40m (was: 18.5h)
> Make non-portable Splittable DoFn the only option when executing Java "Read"
> transforms
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-10670
> URL: https://issues.apache.org/jira/browse/BEAM-10670
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: P2
> Time Spent: 18h 40m
> Remaining Estimate: 0h
>
> All runners seem to be capable of migrating to splittable DoFn for
> non-portable execution except for Dataflow runner v1 which will internalize
> the current primitive read implementation that is shared across runner
> implementations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)