[ https://issues.apache.org/jira/browse/FLINK-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059375#comment-17059375 ]
Jiayi Liao commented on FLINK-16001: ------------------------------------ [~gjy] Thanks for reminding. I've finished a [jmh testing|https://github.com/Jiayi-Liao/jmh-flink-test/blob/master/src/main/java/org/sample/MyBenchmark.java] yesterday to test the inner logic of {{toPipelinedRegionSet}} with different implementations(current java stream implementation and non java stream implementation). The result is shown in attachment. [^benchmark.csv] And according to the test, the performance degradtion is more obvious with the distinct regions cardinality growing. I think the distinct regions cardinality can be very large especially in batch jobs (more than 10k in our production environment) when using {{BLOCKING}} result partition. But you're right that this is not the main bottleneck in job submission. I'm just trying to improve the performance a little bit from the code style aspect. > Avoid using Java Streams in construction of ExecutionGraph > ---------------------------------------------------------- > > Key: FLINK-16001 > URL: https://issues.apache.org/jira/browse/FLINK-16001 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.10.0 > Reporter: Jiayi Liao > Priority: Major > Attachments: benchmark.csv > > > I think we should avoid {{Java Streams}} in construction of > {{ExecutionGraph}} like function {{toPipelinedRegionsSet}} in > {{PipelinedRegionComputeUtil}} because the job submission is definitely > performance sensitive, especially when {{distinctRegions}} has a large > cardinality. > Also includes some other places in package > {{org.apache.flink.runtime.executiongraph}} > cc [~trohrmann] [~gjy] [~zhuzh] -- This message was sent by Atlassian Jira (v8.3.4#803005)