[
https://issues.apache.org/jira/browse/BEAM-11403?focusedWorklogId=527318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-527318
]
ASF GitHub Bot logged work on BEAM-11403:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Dec/20 19:04
Start Date: 22/Dec/20 19:04
Worklog Time Spent: 10m
Work Description: boyuanzz commented on pull request #13592:
URL: https://github.com/apache/beam/pull/13592#issuecomment-749721610
> One more concern - the current implementation relies on
`CheckpointMark#hashCode` and `CheckpointMark#equals`. It is likely that these
will not have these two correctly implemented. We should stick to
`Coder#structuralValue` for that.
> The same holds true for UnboundedSource#hashCode and equals, we should
probably not use the source in the cache key, because it should be impossible
for single DoFn instance to read from multiple readers.
I was thinking about using `Coder#structuralValue` but I'm concerning about
the additional overhead from encoding. The cacheKey is created very
frequently(at least twice per element) and it's not cheap for coder to encode a
value.
As you mentioned, a DoFn instance could process multiple sources especially
the source allows initial split(and we cannot assume that CheckpointMark
contains the source info, although in most case it does), that's why I decided
to use `UnboundedSourceRestriction` to locate a reader. DirectRunner is using
timers(InMemoryTimerInternals) and states(in memory as well) to reschedule
checkpoints. It should be a reference instead of a deep copy?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 527318)
Time Spent: 1h 20m (was: 1h 10m)
> Unbounded SDF wrapper causes performance regression on DirectRunner
> -------------------------------------------------------------------
>
> Key: BEAM-11403
> URL: https://issues.apache.org/jira/browse/BEAM-11403
> Project: Beam
> Issue Type: Bug
> Components: runner-direct, sdk-java-core
> Reporter: Boyuan Zhang
> Assignee: Boyuan Zhang
> Priority: P2
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> There is a significant performance regression when switching from
> UnboundedSource to Unbounded SDF wrapper. So far there are 2 IOs reported:
> * Pubsub Read:
> https://lists.apache.org/thread.html/re6b0941a8b4951293a0327ce9b25e607cafd6e45b69783f65290edee%40%3Cdev.beam.apache.org%3E
> * Kafka Read: https://the-asf.slack.com/archives/C9H0YNP3P/p1606155042346600
--
This message was sent by Atlassian Jira
(v8.3.4#803005)