[
https://issues.apache.org/jira/browse/HBASE-25287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260183#comment-17260183
]
Anoop Sam John commented on HBASE-25287:
----------------------------------------
You mean separate jira for backport to branch-1? This jira having all branch-2
based versions as fixed version already
> Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when
> loading files
> ------------------------------------------------------------------------------------
>
> Key: HBASE-25287
> URL: https://issues.apache.org/jira/browse/HBASE-25287
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0, 2.4.1
>
> Attachments: 1605328358304-image.png, 1605328417888-image.png,
> 1605504914256-image.png
>
>
> HBASE-9393 found seek+read will leave many CLOSE_WAIT sockets without stream
> unbuffer, which can free sockets and file descriptors held by the stream.
> In our cluster RSes with about one hundred thousand store files, we found the
> number of CLOSE_WAIT sockets increases with the number of regions opened,
> and can up to the operating system open files limit 1000000.
>
> {code:java}
> 2020-11-12 20:19:02,452 WARN [1282990092@qtp-220038608-1 - Acceptor0
> [email protected]:16030] mortbay.log: EXCEPTION
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at
> org.mortbay.jetty.nio.SelectChannelConnector$1.acceptChannel(SelectChannelConnector.java:75)
> at
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:686)
> at
> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
> at
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
> at
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
>
> {code:java}
> [hbase@gha-data-hbase-cat0053 hbase]$ ulimit -SHn
> 1000000
> {code}
>
>
> The reason of the problem is, when store file opened,
> {code:java}
> private void open() throws IOException {
> fileInfo.initHDFSBlocksDistribution();
> long readahead = fileInfo.isNoReadahead() ? 0L : -1L;
> ReaderContext context = fileInfo.createReaderContext(false, readahead,
> ReaderType.PREAD);
> fileInfo.initHFileInfo(context);
> StoreFileReader reader = fileInfo.preStoreFileReaderOpen(context,
> cacheConf);
> if (reader == null) {
> reader = fileInfo.createReader(context, cacheConf);
> fileInfo.getHFileInfo().initMetaAndIndex(reader.getHFileReader());
> }
> ....{code}
> only createReader() unbuffered the stream. In initMetaAndIndex(), using the
> stream to read blocks, so it needs to unbuffer() the socket , too.
> We can just add try before fileInfo.initHFileInfo(context); and finally
> unbuffer() the stream at the end of the open() function.
> We fixed it on our cluster, the number of CLOSE_WAIT reduced to about 0.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)