Xiaobing Zhou created HDFS-8696:
-----------------------------------
Summary: Small reads are blocked by large long running reads
Key: HDFS-8696
URL: https://issues.apache.org/jira/browse/HDFS-8696
Project: Hadoop HDFS
Issue Type: Bug
Components: webhdfs
Affects Versions: 2.6.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
Priority: Blocker
There is an issue that appears related to the webhdfs server. When making two
concurrent requests, the DN will sometimes pause for extended periods (I've
seen 1-300 seconds), killing performance and dropping connections.
To reproduce:
1. set up a HDFS cluster
2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
the time out to /tmp/times.txt
{noformat}
i=1
while (true); do
echo $i
let i++
/usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null
"http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1";
done
{noformat}
3. Watch for 1-byte requests that take more than one second:
tail -F /tmp/times.txt | grep -E "^[^0]"
4. After it has had a chance to warm up, start doing large transfers from
another shell:
{noformat}
i=1
while (true); do
echo $i
let i++
(/usr/bin/time -f %e curl -s -L -o /dev/null
"http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root");
done
{noformat}
It's easy to find after a minute or two that small reads will sometimes
pause for 1-300 seconds. In some extreme cases, it appears that the
transfers timeout and the DN drops the connection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)