[
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654152#comment-14654152
]
Jun commented on HDFS-8696:
---------------------------
Hi Xiaobing,
from my test, we got some unexpected result with a larger file(18G) reading.
case #1 - unpatched, hdfs-site.xml has following parameters:
"dfs.webhdfs.server.worker.threads"= 100;
"dfs.webhdfs.server.max.connection.queue.length" = 1024;
"dfs.webhdfs.net.send.buf.size" = 65535;
"dfs.webhdfs.net.receive.buf.size" = 65535;
"dfs.webhdfs.channel.write.buf.low.watermark" = 65535;
"dfs.webhdfs.channel.write.buf.high.watermark" = 131070;
large read test:
$ while (true); do /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o
/dev/null
"http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPEN&user.name=release";
done
$ while (true); do /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o
/dev/null
"http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPEN&user.name=release&length=1";
done
$ tail -F /tmp/times.txt | grep -E "^[^0]"
result:
according to the /tmp/times.txt, delays are in the range 30-60s
case #2 - patched, also set required parameters in the config file -
hdfs-site.xml
large read test as same as case #1, result:
delays are in the range of 40-90s, 2 extremely slow - 155s and 174s
I will update with some percentile later
Thanks
> Reduce the variances of latency of WebHDFS
> ------------------------------------------
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: webhdfs
> Affects Versions: 2.7.0
> Reporter: Xiaobing Zhou
> Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two
> concurrent requests, the DN will sometimes pause for extended periods (I've
> seen 1-300 seconds), killing performance and dropping connections.
> To reproduce:
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null
> "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do
> echo $i
> let i++
> (/usr/bin/time -f %e curl -s -L -o /dev/null
> "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root");
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)