[ 
https://issues.apache.org/jira/browse/HDFS-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jiandan updated HDFS-7145:
-------------------------------
    Description: 
We found that DFSInputStream#read does not return when hbase handlers read 
files from hdfs, and all handlers are in the 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as 
follows:
"RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 nid=0x1572 
runnable [0x000000005a9de000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2)
        - locked <0x000000039ad6e320> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986)
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
        at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023)
        at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293)
        at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
        at 
org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

I read HDFS source code and discover:
1. NioInetPeer#in and NioInetPeer#out default timeout value is 0
{code:xml}
  NioInetPeer(Socket socket) throws IOException {
    this.socket = socket;
    this.in = new SocketInputStream(socket.getChannel(), 0);
    this.out = new SocketOutputStream(socket.getChannel(), 0);
    this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress());
  }

  public SocketInputStream(ReadableByteChannel channel, long timeout)
                                                        throws IOException {
    SocketIOWithTimeout.checkChannelValidity(channel);
    reader = new Reader(channel, timeout);
  }

    Reader(ReadableByteChannel channel, long timeout) throws IOException {
      super((SelectableChannel)channel, timeout);
      this.channel = channel;
    }

  SocketIOWithTimeout(SelectableChannel channel, long timeout) 
                                                 throws IOException {
    checkChannelValidity(channel);
    
    this.channel = channel;
    this.timeout = timeout;
    // Set non-blocking
    channel.configureBlocking(false);
  }
{code}
and result in SocketIOWithTimeout#timeout=0
2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout
which lead to 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and 
does not return. 

  was:
We found that DFSInputStream#read does not return when hbase handlers read 
files from hdfs, and all handlers are in the 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as 
follows:
"RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 nid=0x1572 
runnable [0x000000005a9de000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2)
        - locked <0x000000039ad6e320> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986)
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
        at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023)
        at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293)
        at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
        at 
org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

I read HDFS source code and discover:
1. NioInetPeer#in and NioInetPeer#out default timeout value is 0

  NioInetPeer(Socket socket) throws IOException {
    this.socket = socket;
    this.in = new SocketInputStream(socket.getChannel(), 0);
    this.out = new SocketOutputStream(socket.getChannel(), 0);
    this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress());
  }
and result in SocketIOWithTimeout#timeout=0
2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout
which lead to 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and 
does not return. 


> DFSInputStream does not return when reading
> -------------------------------------------
>
>                 Key: HDFS-7145
>                 URL: https://issues.apache.org/jira/browse/HDFS-7145
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.5.0
>            Reporter: Yang Jiandan
>            Priority: Critical
>
> We found that DFSInputStream#read does not return when hbase handlers read 
> files from hdfs, and all handlers are in the 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(). jstack is as 
> follows:
> "RS_PARALLEL_SEEK-hadoop474:60020-9" prio=10 tid=0x00007f7350be0000 
> nid=0x1572 runnable [0x000000005a9de000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>         at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         - locked <0x000000039ad6e730> (a sun.nio.ch.Util$2)
>         - locked <0x000000039ad6e320> (a 
> java.util.Collections$UnmodifiableSet)
>         - locked <0x00000002bf480738> (a sun.nio.ch.EPollSelectorImpl)
>         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>         at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1986)
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:395)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:786)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:665)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1023)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:966)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1293)
>         at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:90)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1223)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:532)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:553)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHandler.java:57)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> I read HDFS source code and discover:
> 1. NioInetPeer#in and NioInetPeer#out default timeout value is 0
> {code:xml}
>   NioInetPeer(Socket socket) throws IOException {
>     this.socket = socket;
>     this.in = new SocketInputStream(socket.getChannel(), 0);
>     this.out = new SocketOutputStream(socket.getChannel(), 0);
>     this.isLocal = socket.getInetAddress().equals(socket.getLocalAddress());
>   }
>   public SocketInputStream(ReadableByteChannel channel, long timeout)
>                                                         throws IOException {
>     SocketIOWithTimeout.checkChannelValidity(channel);
>     reader = new Reader(channel, timeout);
>   }
>     Reader(ReadableByteChannel channel, long timeout) throws IOException {
>       super((SelectableChannel)channel, timeout);
>       this.channel = channel;
>     }
>   SocketIOWithTimeout(SelectableChannel channel, long timeout) 
>                                                  throws IOException {
>     checkChannelValidity(channel);
>     
>     this.channel = channel;
>     this.timeout = timeout;
>     // Set non-blocking
>     channel.configureBlocking(false);
>   }
> {code}
> and result in SocketIOWithTimeout#timeout=0
> 2. BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout
> which lead to 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(timeout=0) and 
> does not return. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to