1996fanrui commented on code in PR #23589:
URL: https://github.com/apache/flink/pull/23589#discussion_r1377569251


##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamConfig.java:
##########
@@ -795,6 +795,20 @@ public boolean isGraphContainingLoops() {
         return config.getBoolean(GRAPH_CONTAINING_LOOPS, false);
     }
 
+    /**
+     * In general, we don't clear any configuration. However, the {@link 
#SERIALIZED_UDF} may be
+     * very large when operator includes some large objects, the 
SERIALIZED_UDF is used to create a
+     * StreamOperator and usually only needs to be called once. {@link 
#CHAINED_TASK_CONFIG} may be
+     * large as well due to the StreamConfig of all non-head operators in 
OperatorChain will be
+     * serialized and stored in CHAINED_TASK_CONFIG. They can be cleared to 
reduce the memory after
+     * StreamTask is initialized. If so, TM will have more memory during 
running. See FLINK-33315
+     * and FLINK-33317 for more information.
+     */
+    public void clearInitialConfigs() {
+        config.removeKey(SERIALIZED_UDF);
+        config.removeKey(CHAINED_TASK_CONFIG);
+    }

Review Comment:
   Thanks @RocMarshal for the detailed analysis.
   
   The reason I don't want to call the extra code is: all jobs call 
clearInitialConfigs(), but most jobs' UDFs aren't very large, so the less 
overhead is better here. (Of course the overhead is proposed by you is not 
huge).
   
   And for large job, I think current strategy may be enough. If the problem 
you describe does occur, we can solve it in time. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to