[jira] [Created] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

Andrey Elenskiy (Jira) Mon, 15 Jun 2020 12:14:34 -0700

Andrey Elenskiy created HDFS-15413:
--------------------------------------

             Summary: DFSStripedInputStream throws exception when datanodes 
close idle connections
                 Key: HDFS-15413
                 URL: https://issues.apache.org/jira/browse/HDFS-15413
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ec, erasure-coding, hdfs-client
    Affects Versions: 3.1.3
         Environment: - Hadoop 3.1.3
- erasure coding with ISA-L and RS-3-2-1024k scheme
- running in kubernetes
- dfs.client.socket-timeout = 10000
- dfs.datanode.socket.write.timeout = 10000
            Reporter: Andrey Elenskiy
         Attachments: out.log


We've run into an issue with compactions failing in HBase when erasure coding 
is enabled on a table directory. After digging further I was able to narrow it 
down to a seek + read logic and able to reproduce the issue with hdfs client 
only:
{code:java}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataInputStream;

public class ReaderRaw {
    public static void main(final String[] args) throws Exception {
        Path p = new Path(args[0]);
        int bufLen = Integer.parseInt(args[1]);
        int sleepDuration = Integer.parseInt(args[2]);
        int countBeforeSleep = Integer.parseInt(args[3]);
        int countAfterSleep = Integer.parseInt(args[4]);
        Configuration conf = new Configuration();

        FSDataInputStream istream = FileSystem.get(conf).open(p);

        byte[] buf = new byte[bufLen];
        int readTotal = 0;
        int count = 0;
        try {
          while (true) {
            istream.seek(readTotal);

            int bytesRemaining = bufLen;
            int bufOffset = 0;
            while (bytesRemaining > 0) {
              int nread = istream.read(buf, 0, bufLen);
              if (nread < 0) {
                  throw new Exception("nread is less than zero");
              }
              readTotal += nread;
              bufOffset += nread;
              bytesRemaining -= nread;
            }

            count++;
            if (count == countBeforeSleep) {
                System.out.println("sleeping for " + sleepDuration + " 
milliseconds");
                Thread.sleep(sleepDuration);
                System.out.println("resuming");
            }
            if (count == countBeforeSleep + countAfterSleep) {
                System.out.println("done");
                break;
            }
          }
        } catch (Exception e) {
            System.out.println("exception on read " + count + " read total " + 
readTotal);
            throw e;
        }
    }
}
{code}


The issue appears to be due to the fact that datanodes close the connection of 
EC client if it doesn't fetch next packet for longer than 
dfs.client.socket-timeout. The EC client doesn't retry and instead assumes that 
those datanodes went away resulting in "missing blocks" exception.

I was able to consistently reproduce with the following arguments:
{noformat}
bufLen = 1000000 (just below 1MB which is the size of the stripe) 
sleepDuration = (dfs.client.socket-timeout + 1) * 1000 (in our case 11000)
countBeforeSleep = 1
countAfterSleep = 7
{noformat}

I've attached the entire log output of running the snippet above against 
erasure coded file with RS-3-2-1024k policy. And here are the logs from 
datanodes of disconnecting the client:

datanode 1:
{noformat}
2020-06-15 19:06:20,697 INFO datanode.DataNode: Likely the client has stopped 
reading, disconnecting it (datanode-v11-0-hadoop.hadoop:9866:DataXceiver error 
processing READ_BLOCK operation  src: /10.128.23.40:53748 dst: 
/10.128.14.46:9866); java.net.SocketTimeoutException: 10000 millis timeout 
while waiting for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/10.128.14.46:9866 
remote=/10.128.23.40:53748]
{noformat}

datanode 2:
{noformat}
2020-06-15 19:06:20,341 INFO datanode.DataNode: Likely the client has stopped 
reading, disconnecting it (datanode-v11-1-hadoop.hadoop:9866:DataXceiver error 
processing READ_BLOCK operation  src: /10.128.23.40:48772 dst: 
/10.128.9.42:9866); java.net.SocketTimeoutException: 10000 millis timeout while 
waiting for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/10.128.9.42:9866 
remote=/10.128.23.40:48772]
{noformat}

datanode 3:
{noformat}
2020-06-15 19:06:20,467 INFO datanode.DataNode: Likely the client has stopped 
reading, disconnecting it (datanode-v11-3-hadoop.hadoop:9866:DataXceiver error 
processing READ_BLOCK operation  src: /10.128.23.40:57184 dst: 
/10.128.16.13:9866); java.net.SocketTimeoutException: 10000 millis timeout 
while waiting for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/10.128.16.13:9866 
remote=/10.128.23.40:57184]
{noformat}

I've tried running the same code again non-ec files with replication of 3 and 
was not able to reproduce the issue with any parameters. Looking through the 
code, it's pretty clear that non-ec DFSInputStream retries reads after 
exception: 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L844

Let me know if you need any more information that can help you out with 
addressing this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

Reply via email to