[ https://issues.apache.org/jira/browse/HBASE-25287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260183#comment-17260183 ]
Anoop Sam John commented on HBASE-25287: ---------------------------------------- You mean separate jira for backport to branch-1? This jira having all branch-2 based versions as fixed version already > Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when > loading files > ------------------------------------------------------------------------------------ > > Key: HBASE-25287 > URL: https://issues.apache.org/jira/browse/HBASE-25287 > Project: HBase > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0, 2.4.1 > > Attachments: 1605328358304-image.png, 1605328417888-image.png, > 1605504914256-image.png > > > HBASE-9393 found seek+read will leave many CLOSE_WAIT sockets without stream > unbuffer, which can free sockets and file descriptors held by the stream. > In our cluster RSes with about one hundred thousand store files, we found the > number of CLOSE_WAIT sockets increases with the number of regions opened, > and can up to the operating system open files limit 1000000. > > {code:java} > 2020-11-12 20:19:02,452 WARN [1282990092@qtp-220038608-1 - Acceptor0 > SelectChannelConnector@0.0.0.0:16030] mortbay.log: EXCEPTION > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) > at > org.mortbay.jetty.nio.SelectChannelConnector$1.acceptChannel(SelectChannelConnector.java:75) > at > org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:686) > at > org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192) > at > org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) > at > org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > > {code:java} > [hbase@gha-data-hbase-cat0053 hbase]$ ulimit -SHn > 1000000 > {code} > > > The reason of the problem is, when store file opened, > {code:java} > private void open() throws IOException { > fileInfo.initHDFSBlocksDistribution(); > long readahead = fileInfo.isNoReadahead() ? 0L : -1L; > ReaderContext context = fileInfo.createReaderContext(false, readahead, > ReaderType.PREAD); > fileInfo.initHFileInfo(context); > StoreFileReader reader = fileInfo.preStoreFileReaderOpen(context, > cacheConf); > if (reader == null) { > reader = fileInfo.createReader(context, cacheConf); > fileInfo.getHFileInfo().initMetaAndIndex(reader.getHFileReader()); > } > ....{code} > only createReader() unbuffered the stream. In initMetaAndIndex(), using the > stream to read blocks, so it needs to unbuffer() the socket , too. > We can just add try before fileInfo.initHFileInfo(context); and finally > unbuffer() the stream at the end of the open() function. > We fixed it on our cluster, the number of CLOSE_WAIT reduced to about 0. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)