[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

Arvid Heise (Jira) Sun, 28 Jun 2020 11:28:41 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147409#comment-17147409
 ]


Arvid Heise commented on FLINK-18433:
-------------------------------------

[~trohrmann], I used the local executor with explicit Xmx configuration, so I'm 
bypassing all the TM/JM memory setup code. In the end, most values should be 
default values.
{noformat}
2020-06-26 14:53:07,199 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback 
keys: []) required for local execution is not set, setting it to its default 
value 1.7976931348623157E308
2020-06-26 14:53:07,201 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.memory.task.heap.size' , default: null 
(fallback keys: []) required for local execution is not set, setting it to its 
default value 9223372036854775807 bytes
2020-06-26 14:53:07,201 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.memory.task.off-heap.size' , default: 0 
bytes (fallback keys: []) required for local execution is not set, setting it 
to its default value 9223372036854775807 bytes
2020-06-26 14:53:07,202 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.memory.network.min' , default: 64 mb 
(fallback keys: [{key=taskmanager.network.memory.min, isDeprecated=true}]) 
required for local execution is not set, setting it to its default value 64 mb
2020-06-26 14:53:07,202 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.memory.network.max' , default: 1 gb 
(fallback keys: [{key=taskmanager.network.memory.max, isDeprecated=true}]) 
required for local execution is not set, setting it to its default value 64 mb
2020-06-26 14:53:07,202 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The 
configuration option Key: 'taskmanager.memory.managed.size' , default: null 
(fallback keys: [{key=taskmanager.memory.size, isDeprecated=true}]) required 
for local execution is not set, setting it to its default value 128 mb{noformat}
Anyone knows how the TPS metric is calculated? Would slower deployment affect 
it? Or is it only the record/s for the last second? [~Aihua] could you publish 
the raw measurements? I'd like to see the spread and maybe the timeline will 
also help us.

We can exclude the weird cancellation behavior though (should still be 
investigated) as it seems [~Aihua] did not cancel the job before taking the 
metric.

> From the end-to-end performance test results, 1.11 has a regression
> -------------------------------------------------------------------
>
>                 Key: FLINK-18433
>                 URL: https://issues.apache.org/jira/browse/FLINK-18433
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core, API / DataStream
>    Affects Versions: 1.11.0
>         Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>            Reporter: Aihua Li
>            Priority: Major
>         Attachments: flink_11.log.gz
>
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.81333333|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.323333|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.6883333|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.123333|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.91666667|43.09833333|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.7266667|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.55166667|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.0383333|200.3883333|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.05833333|43.49666667|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.2333333|201.1883333|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.663333|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.62333333|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.9183333|152.9566667|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.29666667|34.16666667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.3533333|151.8483333|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.57166667|32.09666667|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.5883333|145.9966667|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.68333333|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.3033333|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis：
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

Reply via email to