klion26 commented on issue #9602: [FLINK-13856][checkpoint] Reduce the delete file api when the checkpo… URL: https://github.com/apache/flink/pull/9602#issuecomment-547390264 @chendonglin521 I think reduce the RPC call is really import for HDFS, we encounter the RPC pressure also(keyed states and operator states). And we solved the problem by using shared the same file for different state handle, currently we have filed a issue[1] for this, and @StephanEwen Step is helping review the filed PR, the PR currently is just for shared state. We have also implemented this feature for exclusive state currently in our production, from the observation from our production, this can reduce much RPC count(including create RPC, delete RPC, create block RPC and so on…), the create file RPC can reduce 1/3 for all the cluster, and request queue length reduced from thousands to less than hundred. And I think the problem you encountered here can benefit from our proposal also. [1] https://issues.apache.org/jira/browse/FLINK-11937
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
