Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

Konstantin Shvachko Wed, 07 Jan 2009 11:47:51 -0800


tienduc_dinh wrote:

Hi Konstantin,

thanks so much for your help. I was a litte bit confused about why my
setting mapred.map.tasks = 10 in hadoop-site.xml, but hadoop didn't map
anything. So your answer with
In case of TestDFSIO it will be overridden by "-nrFiles".
is the key.I need now your confirm to know, if I've understood it right.


That is correct.

+ If I want to write 2 GB with 1 map task, I should use the following
command.

hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048 -nrFiles

1

The values of throughput are, e.g. 33,60 / 31,48 / 30,95.

+ If I want to write 2 GB with 4 map tasks, I should use the following
command.

hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 5012 -nrFiles
4


You are writing 20GB not 2GB.
Should be 512 instead of 5012.

The values of throughput are, e.g. 31,50 / 32,09 / 30,56.

Can you please explain me, why the values in case 2 are much better. I have
1 master and 4 slaves and if I calculate it right, they must be even 4 times
higher, right ?


throughput is mb/sec per client.
It is great that you get the same numbers for 1 write and 4 parallel writes.
This means that Hadoop on your cluster scales well! :-)

Sorry for my poor english skill and thanks very much for your help.

Tien Duc Dinh


Konstantin Shvachko wrote:

Hi tienduc_dinh,

Just a bit of a background, which should help to answer your questions.
TestDFSIO mappers perform one operation (read or write) each, measure
the time taken by the operation and output the following three values:
(I am intentionally omitting some other output stuff.)
- size(i)
- time(i)
- rate(i) = size(i) / time(i)
i is the index of the map task 0 <= i < N, and N is the "-nrFiles" value,
which equals the number of maps.

Then the reduce sums those values and writes them into "part-00000".
That is you get three fields in it
size = size(0) + ... + size(N-1)
time = time(0) + ... + time(N-1)
rate = rate(0) + ... + rate(N-1)

Then we calculate
throughput = size / time
averageIORate = rate / N

So answering your questions
- There should be only one reduce task, otherwise you will have to
manually sum corresponding values in "part-00000" and "part-00001".
- The value of the ":rate" after the reduce equals the sum of individual
rates of each operation. So if you want to have an average you should
divide it by the number tasks rather than multiply.

Now, in your case you create only one file "-nrFiles 1", which means
you run only one map task.
Setting "mapred.map.tasks" to 10 in hadoop-site.xml defines the default
number of tasks per job. See here
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.map.tasks
In case of TestDFSIO it will be overridden by "-nrFiles".

Hope this answers your questions.
Thanks,
--Konstantin



tienduc_dinh wrote:

Hello,

I'm now using hadoop-0.18.0 and testing it on a cluster with 1 master and
4
slaves. In hadoop-site.xml the value of "mapred.map.tasks" is 10. Because
the values "throughput" and "average IO rate" are similar, I just post
the
values of "throughput" of the same command with 3 times running

- > hadoop-0.18.0/bin/hadoop jar testDFSIO.jar -write -fileSize 2048
-nrFiles 1

+ with "dfs.replication = 1" => 33,60 / 31,48 / 30,95

+ with "dfs.replication = 2" => 26,40 / 20,99 / 21,70

I find something strange while reading the source code.- The value of mapred.reduce.tasks is always set to 1

job.setNumReduceTasks(1) in the function runIOTest()  and reduceFile =
new
Path(WRITE_DIR, "part-00000") in analyzeResult().

So I think, if we properly have mapred.reduce.tasks = 2, we will have on
the
file system 2 Paths to "part-00000" and "part-00001", e.g.
/benchmarks/TestDFSIO/io_write/part-00000

- And i don't understand the line with "double med = rate / 1000 /
tasks".
Is it not "double med = rate * tasks / 1000 "

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

Reply via email to