[
https://issues.apache.org/jira/browse/HDFS-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HDFS-8797:
--------------------------------------
Fix Version/s: 2.7.5
2.9.0
Pushed to branch-2.7. Only a minor conflict in TestWebHDFS.
Updated Fix versions.
> WebHdfsFileSystem creates too many connections for pread
> --------------------------------------------------------
>
> Key: HDFS-8797
> URL: https://issues.apache.org/jira/browse/HDFS-8797
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: webhdfs
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1, 2.7.5
>
> Attachments: HDFS-8797.000.patch, HDFS-8797.001.patch,
> HDFS-8797.002.patch, HDFS-8797.003.patch
>
>
> While running a test we found that WebHdfsFileSystem can create several
> thousand connections when doing a position read of a 200MB file. For each
> connection the client will connect to the DataNode again and the DataNode
> will create a new DFSClient instance to handle the read request. This also
> leads to several thousand {{getBlockLocations}} call to the NameNode.
> The cause of the issue is that in {{FSInputStream#read(long, byte[], int,
> int)}}, each time the inputstream reads some time, it seeks back to the old
> position and resets its state to SEEK. Thus the next read will regenerate the
> connection.
> {code}
> public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> synchronized (this) {
> long oldPos = getPos();
> int nread = -1;
> try {
> seek(position);
> nread = read(buffer, offset, length);
> } finally {
> seek(oldPos);
> }
> return nread;
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]