[ https://issues.apache.org/jira/browse/FLINK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991225#comment-16991225 ]
Biao Liu commented on FLINK-15032: ---------------------------------- [~trohrmann], thanks for explanation! I think you have raised a critical question. Besides the message size checking, the serialization checking might be more sticky. We might avoid the message size checking somehow, see the discussion under FLINK-4399. But I don't think we could do the same thing on serialization checking. It seems that this synchronous pre-checking can not be avoided, because some rpc invocations are "fire and forget". It's hard to tell invoker that there is something wrong with the rpc message without a synchronous pre-checking. Here are some rough ideas of mine. I think maybe we could do this optimization on the invocation without user-defined fields. I believe we could guarantee that the message is small and serializable in some scenarios. > Remove the eager serialization from `RemoteRpcInvocation` > ---------------------------------------------------------- > > Key: FLINK-15032 > URL: https://issues.apache.org/jira/browse/FLINK-15032 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Reporter: Guowei Ma > Priority: Major > > Currently, the constructor of `RemoteRpcInvocation` serializes the > `parameterTypes` and `arg` of an RPC call. This could lead to a problem: > Consider a job that has 1k parallelism and has a 1m union list state. When > deploying the 1k tasks, the eager serialization would use 1G memory > instantly(Some time the serialization amplifies the memory usage). However, > the serialized object is only used when the Akka sends the message. So we > could reduce the memory pressure if we only serialize the object when the > message would be sent by the Akka. > Akka would serialize the message at last and all the XXXGateway related class > could be visible by the RPC level. Because of that, I think the eager > serialization in the constructor of `RemoteRpcInvocation` could be avoided. I > also do a simple test and find this could reduce the time cost of the RPC > call. The 1k number of RPC calls with 1m `String` message: The current > version costs around 2700ms; the Nonserialization version cost about 37ms. > > In summary, this Jira proposes to remove the eager serialization at the > constructor of `RemoteRpcInvocation`. -- This message was sent by Atlassian Jira (v8.3.4#803005)