Raju Bairishetti created HADOOP-11205:
-----------------------------------------

             Summary: ThrottledInputStream should return the actual bandwidth 
(read rate)
                 Key: HADOOP-11205
                 URL: https://issues.apache.org/jira/browse/HADOOP-11205
             Project: Hadoop Common
          Issue Type: Bug
          Components: tools/distcp
            Reporter: Raju Bairishetti
            Assignee: Raju Bairishetti


Currently, it is not returning the actual read rate. Due to this, most of the 
time is in idle state.

Behavior: First, it checks whether current bandwidth (number of bytes per 
second) is more than maxBandwidth before reading a chunk of bytes(or byte) from 
buffer. If read rate exceeds max bandwidth then it sleeps for 50ms and resume 
the process after the sleeping period(50ms).

Ex: Assume, both maxBandwidth = 1MBPS and read rate = 1MBPS(i.e. reading 1M 
messages per second())

In the above case,  even if it reads 1.5MB in 1.5 sec which is ideally not 
crossing the max bandwidth but still it goes for sleeping mode as it assumes 
read rate is 1.5M (bytes read/ time i.e. 1.5/1.. time is 1500ms/1000 =1) 
instead of 1(i.e. 1.5/1.5).

Example: 
It does not got to sleep mode till 1 sec as number of bytes read in that 
elapsed time is lesser than maxBandwidth.
when it reads 1M +1 byte/chunk it checks read rate against maxBandwidth. 
when it reads 1M + 2byte /chunk it sleeps for 50ms as read rate is > 1
when it reads 1M + 3byte/chunk again it sleeps for 50ms as read rate is > 1
...
even if it reads 1.5MB in 1.5 sec but still it goes for sleeping mode as it 
assumes read rate is 1.5M (bytes read/ time i.e. 1.5/1.. time is 1500ms/1000 
=1) instead of 1(i.e. 1.5/1.5).

Cons: it reads for a sec and almost sleeps for a 1sec in an alternate fashion.

getBytesPerSec() method is not returning the actual bandwidth.
Current code: {code}
public long getBytesPerSec() {
    long elapsed = (System.currentTimeMillis() - startTime) / 1000;
    if (elapsed == 0) {
      return bytesRead;
    } else {
      return bytesRead / elapsed;
    }
  }
{code}
We should fix the getBytesPerSec() method:
{code}
public long getBytesPerSec() {
    long elapsedTimeInMilliSecs = System.currentTimeMillis() - startTime;
    if (elapsedTimeInMilliSecs <= MILLISECONDS_IN_SEC) {
      return bytesRead;
    } else {
      return (bytesRead * MILLISECONDS_IN_SEC)/ elapsedTimeInMilliSecs;
    }
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to