[ 
https://issues.apache.org/jira/browse/FLINK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986721#comment-16986721
 ] 

Biao Liu commented on FLINK-15032:
----------------------------------

Hi [~maguowei], thanks for the impressive proposal.

I have not fully understood the scenario. Here are some questions of mine.

{quote}Consider a job that has 1k parallelism and has a 1m union list state. 
When deploying the 1k tasks, the eager serialization would use 1G memory 
instantly(Some time the serialization amplifies the memory usage). However, the 
serialized object is only used when the Akka sends the message.  So we could 
reduce the memory pressure if we only serialize the object when the message 
would be sent by the Akka.{quote}
Do you mean we could postpone the serialization till 
{{RemoteRpcInvocation#writeObject}}?

{quote}Furthermore, Akka would serialize the message at last and all the 
XXXGateway related class could be visible by the RPC level. Because of that, I 
think the serialization in the constructor of `RemoteRpcInvocation` could be 
avoided.{quote}
Could you explain it a bit more? Do you mean there is no visible issue of 
postponing the serialization? Or we don't need serialization anymore?

{quote}In summary, this Jira proposes to remove the serialization at the 
constructor of `RemoteRpcInvocation`.{quote}
Do we still need serialization in {{wirteObject}} after removing serialization 
in constructor?


> Remove the eagerly serialization from `RemoteRpcInvocation` 
> ------------------------------------------------------------
>
>                 Key: FLINK-15032
>                 URL: https://issues.apache.org/jira/browse/FLINK-15032
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Guowei Ma
>            Priority: Major
>
>     Currently, the constructor of `RemoteRpcInvocation` serializes the 
> `parameterTypes` and `arg` of an RPC call. This could lead to two problems:
>  # Consider a job that has 1k parallelism and has a 1m union list state. When 
> deploying the 1k tasks, the eager serialization would use 1G memory 
> instantly(Some time the serialization amplifies the memory usage). However, 
> the serialized object is only used when the Akka sends the message.  So we 
> could reduce the memory pressure if we only serialize the object when the 
> message would be sent by the Akka.
>  # Furthermore, Akka would serialize the message at last and all the 
> XXXGateway related class could be visible by the RPC level. Because of that, 
> I think the serialization in the constructor of `RemoteRpcInvocation` could 
> be avoided. I also do a simple test and find this could reduce the time cost 
> of the RPC call. The 1k number of  RPC calls with 1m `String` message:  The 
> current version costs around 2700ms; the Nonserialization version cost about 
> 37ms.
>  
> In summary, this Jira proposes to remove the serialization at the constructor 
> of `RemoteRpcInvocation`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to