[jira] [Commented] (HADOOP-11354) ThrottledInputStream doesn't perform effective throttling

Hadoop QA (JIRA) Thu, 04 Dec 2014 21:06:51 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235095#comment-14235095
 ]


Hadoop QA commented on HADOOP-11354:
------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685213/mapreduce-6180-001.patch
  against trunk revision 7896815.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5152//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5152//console

This message is automatically generated.

> ThrottledInputStream doesn't perform effective throttling
> ---------------------------------------------------------
>
>                 Key: HADOOP-11354
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11354
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: mapreduce-6180-001.patch
>
>
> This was first reported in HBASE-12632 by [~Tobi] :
> I just transferred a ton of data using ExportSnapshot with bandwidth 
> throttling from one Hadoop cluster to another Hadoop cluster, and discovered 
> that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50 
> ms), at the start of each read call, disregarding the actual amount of data 
> read.
> ExportSnapshot defaults to a buffer size as big as the block size of the 
> outputFs:
> {code:java}
>       // Use the default block size of the outputFs if bigger
>       int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), 
> BUFFER_SIZE);
>       bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
>       LOG.info("Using bufferSize=" + 
> StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, 
> each time sleeping only 50ms. Thus, in the worst case where each call to read 
> fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot 
> reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it 
> still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because 
> the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it 
> keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
>   private void throttle() throws IOException {
>     while (getBytesPerSec() > maxBytesPerSec) {
>       try {
>         Thread.sleep(SLEEP_DURATION_MS);
>         totalSleepTime += SLEEP_DURATION_MS;
>       } catch (InterruptedException e) {
>         throw new IOException("Thread aborted", e);
>       }
>     }
>   }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes, 
> ThrottledInputStream will be throttling only the number of read calls to 20 
> per second, disregarding the number of bytes read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11354) ThrottledInputStream doesn't perform effective throttling

Reply via email to