[ https://issues.apache.org/jira/browse/HADOOP-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaobing Zhou updated HADOOP-13436: ----------------------------------- Summary: RPC connections are leaking due to not overriding hashCode and equals (was: RPC connections are leaking due to missing equals override in RetryUtils#getDefaultRetryPolicy) > RPC connections are leaking due to not overriding hashCode and equals > --------------------------------------------------------------------- > > Key: HADOOP-13436 > URL: https://issues.apache.org/jira/browse/HADOOP-13436 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 2.7.1 > Reporter: Xiaobing Zhou > Assignee: Xiaobing Zhou > Attachments: repro.sh > > > We've noticed RPC connections are increasing dramatically in a Kerberized > HDFS cluster with {noformat}dfs.client.retry.policy.enabled{noformat} > enabled. Internally, Client#getConnection is doing lookup relying on > ConnectionId #hashCode and then #equals, which compose checking > Subclass-of-RetryPolicy #hashCode and #equals. If subclasses of RetryPolicy > neglect overriding #hashCode or #equals, every instance of RetryPolicy with > equivalent fields' values (e.g. MultipleLinearRandomRetry[6x10000ms, > 10x60000ms]) will lead to a brand new connection because the check will fall > back to Object#hashCode and Object#equals which is distinct and false for > distinct instances. > This is stack trace where the anonymous RetryPolicy implementation > (neglecting overriding hashCode and equals) in > RetryUtils#getDefaultRetryPolicy is called: > {noformat} > at > org.apache.hadoop.io.retry.RetryUtils.getDefaultRetryPolicy(RetryUtils.java:82) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:409) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:315) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:609) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.newDfsClient(WebHdfsHandler.java:272) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.onOpen(WebHdfsHandler.java:215) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.handle(WebHdfsHandler.java:135) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:117) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.channelRead0(WebHdfsHandler.java:114) > at > org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:52) > at > org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:32) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Three options to fix the problem: > 1. All subclasses of RetryPolicy must override equals and hashCode to deliver > less discriminating equivalence relation, i.e. they are equal if they have > meaningful equivalent fields' values (e.g. > MultipleLinearRandomRetry[6x10000ms, 10x60000ms]) > 2. Change ConnectionId#equals by removing RetryPolicy#equals compoment. > 3. Let WebHDFS reuse the DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org