Jing Zhao created HDFS-8797:
-------------------------------

             Summary: WebHdfsFileSystem creates too many connections for pread
                 Key: HDFS-8797
                 URL: https://issues.apache.org/jira/browse/HDFS-8797
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: webhdfs
            Reporter: Jing Zhao


While running a test we found that WebHdfsFileSystem can create several 
thousand connections when doing a position read of a 200MB file. For each 
connection the client will connect to the DataNode again and the DataNode will 
create a new DFSClient instance to handle the read request. This also leads to 
several thousand {{getBlockLocations}} call to the NameNode.

The cause of the issue is that in {{FSInputStream#read(long, byte[], int, 
int)}}, each time the inputstream reads some time, it seeks back to the old 
position and resets its state to SEEK. Thus the next read will regenerate the 
connection.
{code}
  public int read(long position, byte[] buffer, int offset, int length)
    throws IOException {
    synchronized (this) {
      long oldPos = getPos();
      int nread = -1;
      try {
        seek(position);
        nread = read(buffer, offset, length);
      } finally {
        seek(oldPos);
      }
      return nread;
    }
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to