Jing Zhao created HDFS-8797:
-------------------------------
Summary: WebHdfsFileSystem creates too many connections for pread
Key: HDFS-8797
URL: https://issues.apache.org/jira/browse/HDFS-8797
Project: Hadoop HDFS
Issue Type: Bug
Components: webhdfs
Reporter: Jing Zhao
While running a test we found that WebHdfsFileSystem can create several
thousand connections when doing a position read of a 200MB file. For each
connection the client will connect to the DataNode again and the DataNode will
create a new DFSClient instance to handle the read request. This also leads to
several thousand {{getBlockLocations}} call to the NameNode.
The cause of the issue is that in {{FSInputStream#read(long, byte[], int,
int)}}, each time the inputstream reads some time, it seeks back to the old
position and resets its state to SEEK. Thus the next read will regenerate the
connection.
{code}
public int read(long position, byte[] buffer, int offset, int length)
throws IOException {
synchronized (this) {
long oldPos = getPos();
int nread = -1;
try {
seek(position);
nread = read(buffer, offset, length);
} finally {
seek(oldPos);
}
return nread;
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)