[
https://issues.apache.org/jira/browse/BEAM-10670?focusedWorklogId=496726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496726
]
ASF GitHub Bot logged work on BEAM-10670:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 07/Oct/20 16:21
Start Date: 07/Oct/20 16:21
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #13021:
URL: https://github.com/apache/beam/pull/13021#issuecomment-705046950
> I am comparing the results of current master vs this PR in batch mode and
the improvements are so big that I am even confused of how can it be so
different, is it partitioning less or ignoring some operations because not need
to estimate watermarks or something? Difference is really that good, Amazing!
>
> **Current master:**
>
> ```
> Performance:
> Conf Runtime(sec) (Baseline) Events(/sec) (Baseline)
Results (Baseline)
> 0000 2.1 47303.7
100000
> 0001 0.6 169779.3
92000
> 0002 0.3 293255.1
351
> 0003 3.1 32299.7
580
> 0004 1.0 10427.5
40
> 0005 1.5 67340.1
12
> 0006 1.1 9487.7
103
> 0007 1.7 59101.7
1
> 0008 1.3 77279.8
6000
> 0009 0.5 19084.0
298
> 0010 0.7 153139.4
1
> 0011 1.8 54112.6
1919
> 0012 0.9 112359.6
1919
> 0013 0.3 304878.0
92000
> 0014 0.9 113507.4
92000
>
==========================================================================================
> ```
>
> This PR
>
> ```
> Performance:
> Conf Runtime(sec) (Baseline) Events(/sec) (Baseline)
Results (Baseline)
> 0000 1.1 90090.1
100000
> 0001 0.3 337837.8
92000
> 0002 0.1 694444.4
351
> 0003 1.4 71582.0
580
> 0004 1.0 10111.2
40
> 0005 0.6 177935.9
12
> 0006 0.3 40000.0
103
> 0007 0.4 227272.7
1
> 0008 0.3 314465.4
6000
> 0009 0.2 49019.6
298
> 0010 0.6 165016.5
1
> 0011 0.5 187969.9
1919
> 0012 0.2 492610.8
1919
> 0013 0.3 392156.9
92000
> 0014 0.9 113765.6
92000
>
==========================================================================================
> ```
BoundedSource class doesn't expose a watermark API and the
BoundedSourceAsSdfWrapper doesn't do much differently then the SourceRDD
implementation. Partitioning is different, if you matched the partitioning
between the SDF version and the non SDF version you'll see similar results.
Flink saw something similar as well.
You can see that run conf 10 and 14 didn't have significantly different
results.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 496726)
Time Spent: 32h 40m (was: 32.5h)
> Make non-portable Splittable DoFn the only option when executing Java "Read"
> transforms
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-10670
> URL: https://issues.apache.org/jira/browse/BEAM-10670
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: P2
> Time Spent: 32h 40m
> Remaining Estimate: 0h
>
> All runners seem to be capable of migrating to splittable DoFn for
> non-portable execution except for Dataflow runner v1 which will internalize
> the current primitive read implementation that is shared across runner
> implementations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)