[jira] [Created] (FLINK-30531) Reduce operator chain call stack depth

Dong Lin (Jira) Wed, 28 Dec 2022 17:31:11 -0800

Dong Lin created FLINK-30531:
--------------------------------

             Summary: Reduce operator chain call stack depth
                 Key: FLINK-30531
                 URL: https://issues.apache.org/jira/browse/FLINK-30531
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Task
            Reporter: Dong Lin



Benchmark results show that Flink time to execute simple programs is more than 
3X slower than Spark. For example, if we run the following program with object 
re-use enabled and with parallelism=1, it takes roughtly 120 sec on a macbook, 
whereas it takes Spark less than 40 sec to run the same logic on the same 
machine.
{code:java}
DataStream<Long> stream = env.fromSequence(1, 1000000000L)
.map(x -> x)
.map(x -> x)
.map(x -> x)
.map(x -> x)
.map(x -> x).addSink(new DiscardingSink<>());
{code}
 

It turns out that the operator chain overhead introduced by Flink is 
surprisingly high. For the above example program, Flink runtime goes through a 
call stack of 24 functions to produce 1 element. And each extra map(...) 
operation introduces 4 extra functions in the call stack.

Here are the 24 functions in the call stack:
{code:bash}
StreamTask#processInput
StreamOneInputProcessor#processInput
StreamTaskSourceInput#emitNext
SourceOperator#emitNext
IteratorSourceReaderBase#pollNext
SourceOutputWithWatermarks#collect
AsyncDataOutputToOutput#emitRecord
ChainingOutput#collect
StreamMap#processElement
CountingOutput#collect
ChainingOutput#collect
StreamMap#processElement
CountingOutput#collect
ChainingOutput#collect
StreamMap#processElement
CountingOutput#collect
ChainingOutput#collect
StreamMap#processElement
CountingOutput#collect
ChainingOutput#collect
StreamMap#processElement
CountingOutput#collect
ChainingOutput#collect
StreamSink#processElement
{code}
 

Given the evidence described above, we find the following explanations for why 
Flink is slow for programs with low computation overhead:
 * Operator chain currently uses pull-based loop, which has worse branch 
prediction than push-based loop.
 * Java's maximum inline level is less than 18 [2]. It is easy for operator 
chain call stack to exceeds this limit and prevent Java from inlining function 
calls, which further increases the function call overhead.
 * For function calls that are not inlined, it requires looking up a virtual 
table since most functions are virtual functions.

 

[1] [https://arxiv.org/pdf/1610.09166.pdf]

[2] [https://bugs.openjdk.org/browse/JDK-8234863]

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30531) Reduce operator chain call stack depth

Reply via email to