[ https://issues.apache.org/jira/browse/HDFS-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Jiandan updated HDFS-7145: ------------------------------- Description: We found that DFSInputStream#read does not return when hbase handlers read files from hdfs, and all handlers are in the org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as follows: "RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 nid=0x1572 runnable [0x000000005a9de000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2) - locked <0x000000039ad6e320> (a java.util.Collections$UnmodifiableSet) - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152) at org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I read HDFS source code and discover: 1. NioInetPeer#in and NioInetPeer#out default timeout value is 0 {code:xml} NioInetPeer(Socket socket) throws IOException { this.socket = socket; this.in = new SocketInputStream(socket.getChannel(), 0); this.out = new SocketOutputStream(socket.getChannel(), 0); this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress()); } public SocketInputStream(ReadableByteChannel channel, long timeout) throws IOException { SocketIOWithTimeout.checkChannelValidity(channel); reader = new Reader(channel, timeout); } Reader(ReadableByteChannel channel, long timeout) throws IOException { super((SelectableChannel)channel, timeout); this.channel = channel; } SocketIOWithTimeout(SelectableChannel channel, long timeout) throws IOException { checkChannelValidity(channel); this.channel = channel; this.timeout = timeout; // Set non-blocking channel.configureBlocking(false); } {code} and result in SocketIOWithTimeout#timeout=0 2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout which lead to org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and does not return. was: We found that DFSInputStream#read does not return when hbase handlers read files from hdfs, and all handlers are in the org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as follows: "RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 nid=0x1572 runnable [0x000000005a9de000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2) - locked <0x000000039ad6e320> (a java.util.Collections$UnmodifiableSet) - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152) at org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I read HDFS source code and discover: 1. NioInetPeer#in and NioInetPeer#out default timeout value is 0 NioInetPeer(Socket socket) throws IOException { this.socket = socket; this.in = new SocketInputStream(socket.getChannel(), 0); this.out = new SocketOutputStream(socket.getChannel(), 0); this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress()); } and result in SocketIOWithTimeout#timeout=0 2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout which lead to org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and does not return. > DFSInputStream does not return when reading > ------------------------------------------- > > Key: HDFS-7145 > URL: https://issues.apache.org/jira/browse/HDFS-7145 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.5.0 > Reporter: Yang Jiandan > Priority: Critical > > We found that DFSInputStream#read does not return when hbase handlers read > files from hdfs, and all handlers are in the > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as > follows: > "RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 > nid=0x1572 runnable [0x000000005a9de000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2) > - locked <0x000000039ad6e320> (a > java.util.Collections$UnmodifiableSet) > - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986) > at > org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665) > at > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325) > at > org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966) > at > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152) > at > org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > I read HDFS source code and discover: > 1. NioInetPeer#in and NioInetPeer#out default timeout value is 0 > {code:xml} > NioInetPeer(Socket socket) throws IOException { > this.socket = socket; > this.in = new SocketInputStream(socket.getChannel(), 0); > this.out = new SocketOutputStream(socket.getChannel(), 0); > this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress()); > } > public SocketInputStream(ReadableByteChannel channel, long timeout) > throws IOException { > SocketIOWithTimeout.checkChannelValidity(channel); > reader = new Reader(channel, timeout); > } > Reader(ReadableByteChannel channel, long timeout) throws IOException { > super((SelectableChannel)channel, timeout); > this.channel = channel; > } > SocketIOWithTimeout(SelectableChannel channel, long timeout) > throws IOException { > checkChannelValidity(channel); > > this.channel = channel; > this.timeout = timeout; > // Set non-blocking > channel.configureBlocking(false); > } > {code} > and result in SocketIOWithTimeout#timeout=0 > 2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout > which lead to > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and > does not return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)