[
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146139#comment-17146139
]
Arvid Heise edited comment on FLINK-18433 at 6/26/20, 9:27 AM:
---------------------------------------------------------------
Puh, this is not easy to reproduce.
* for me, both branches are not compiling; fix seems rather easy, but there is
a general cleanup necessary including commit structure and source code.
* it's unclear to me why there are 3 machines in environment of the ticket, but
DOP is 1. Were they executed on a cluster with 3 machines and one TM got
randomly selected? Why not just go with 1 TM then to make results more
comparable.
* *did you actually enable checkpointing? Without setting the
checkpointInterval, no checkpointing is enabled in my tests.*
* how is the actual measurement performed? And what are we actually measuring?
Is it throughput? But how did you calculate it? The job has no calculation on
its own as far as I can see. So is it a simple `time` or did you manually
extract the execution time and to normalize the number of elements processed?
* how long did things run? With the command that you presented, stuff is
running indefinitely. I'm assuming you didn't run to maxCount = Long.MAX_VALUE,
although that is the default setting. If it's rather short running, how often
were things repeated?
* what did you configure in flink-conf.yaml? Left to default values?
* that's a bit independent, but when did you run 1.11 and 1.10 tests? I noticed
on EMR that I got 5-10% more throughput by running in the night than during the
day. So for comparable results, the comparable measurements should be taken
rather closely.
So I'm pretty much stuck at bisecting because there is too little information.
I will pick one of the cases that [~liyu] and do some basic tests.
was (Author: aheise):
Puh, this is not easy to reproduce.
First, for me, both branches are not compiling; fix seems rather easy, but
there is a general cleanup necessary including commit structure and source code.
Second, it's unclear to me why there are 3 machines in environment of the
ticket, but DOP is 1. Were they executed on a cluster with 3 machines and one
TM got randomly selected? Why not just go with 1 TM then to make results more
comparable.
Third, how is the actual measurement performed? And what are we actually
measuring? Is it throughput? But how did you calculate it? The job has no
calculation on its own as far as I can see. So is it a simple `time` or did you
manually extract the execution time and to normalize the number of elements
processed?
Fourth, how long did things run? With the command that you presented, stuff is
running indefinitely. I'm assuming you didn't run to maxCount = Long.MAX_VALUE,
although that is the default setting. If it's rather short running, how often
were things repeated?
Fifth, what did you configure in flink-conf.yaml? Left to default values?
Sixth, that's a bit independent, but when did you run 1.11 and 1.10 tests? I
noticed on EMR that I got 5-10% more throughput by running in the night than
during the day. So for comparable results, the comparable measurements should
be taken rather closely.
So I'm pretty much stuck at bisecting because there is too little information.
I will pick one of the cases that [~liyu] and do some basic tests.
> From the end-to-end performance test results, 1.11 has a regression
> -------------------------------------------------------------------
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
> Issue Type: Bug
> Components: API / Core, API / DataStream
> Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> Reporter: Aihua Li
> Priority: Major
>
>
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11.
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.81333333|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.323333|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.6883333|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.123333|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.91666667|43.09833333|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.7266667|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.55166667|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.0383333|200.3883333|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.05833333|43.49666667|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.2333333|201.1883333|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.663333|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.62333333|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.9183333|152.9566667|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.29666667|34.16666667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.3533333|151.8483333|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.57166667|32.09666667|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.5883333|145.9966667|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.68333333|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.3033333|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0:
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0:
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c
> org.apache.flink.basic.operations.PerformanceTestJob
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)