[
https://issues.apache.org/jira/browse/HADOOP-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235095#comment-14235095
]
Hadoop QA commented on HADOOP-11354:
------------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12685213/mapreduce-6180-001.patch
against trunk revision 7896815.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HADOOP-Build/5152//testReport/
Console output:
https://builds.apache.org/job/PreCommit-HADOOP-Build/5152//console
This message is automatically generated.
> ThrottledInputStream doesn't perform effective throttling
> ---------------------------------------------------------
>
> Key: HADOOP-11354
> URL: https://issues.apache.org/jira/browse/HADOOP-11354
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: mapreduce-6180-001.patch
>
>
> This was first reported in HBASE-12632 by [~Tobi] :
> I just transferred a ton of data using ExportSnapshot with bandwidth
> throttling from one Hadoop cluster to another Hadoop cluster, and discovered
> that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50
> ms), at the start of each read call, disregarding the actual amount of data
> read.
> ExportSnapshot defaults to a buffer size as big as the block size of the
> outputFs:
> {code:java}
> // Use the default block size of the outputFs if bigger
> int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(),
> BUFFER_SIZE);
> bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
> LOG.info("Using bufferSize=" +
> StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time,
> each time sleeping only 50ms. Thus, in the worst case where each call to read
> fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot
> reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it
> still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because
> the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it
> keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
> private void throttle() throws IOException {
> while (getBytesPerSec() > maxBytesPerSec) {
> try {
> Thread.sleep(SLEEP_DURATION_MS);
> totalSleepTime += SLEEP_DURATION_MS;
> } catch (InterruptedException e) {
> throw new IOException("Thread aborted", e);
> }
> }
> }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes,
> ThrottledInputStream will be throttling only the number of read calls to 20
> per second, disregarding the number of bytes read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)