[jira] Commented: (HADOOP-2000) Re-write NNBench to use MapReduce

Konstantin Shvachko (JIRA) Fri, 02 Nov 2007 03:10:26 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539565
 ]


Konstantin Shvachko commented on HADOOP-2000:
---------------------------------------------

# redundant imports
import java.text.DateFormat;
import org.apache.hadoop.mapred.Reducer;
# variable name in NNBenchMapper.map() is never used.
# Typo
{code}
    // Set user-dfined parameters,
{code}
# Printing TPS calculating TPmS. Should be the same:
{code}
    "       RAW DATA: TPS Total : " + totalTimeTPmS,
{code}
# double totalTimeTPS is confusing, since it is in fact TPS, not time according 
to the formula and the comments
# I am not happy with the whole concept of transactions per second.
So you measure total execution time of each map (t_i) and then divide 
Number_of_files / Sum(t_i).
But the Sum(t_i) is not the right time, because maps are running in parallel,
so in order to obtain the true TPS you need to time the start and the end of 
+*all*+ maps 
rather than the start and the end of +*individual*+ maps.
But it is hard to get the exact starting and ending times of the job's map 
stage.
Your proposed TPS measures the # of transactions per second of a single client 
under a certain load on the cluster.
This is not completely unreasonable, but does not say much as a benchmark 
result imo.
I mean it is quite clear that if the cluster bears more load the clients run 
slower.

> Re-write NNBench to use MapReduce
> ---------------------------------
>
>                 Key: HADOOP-2000
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2000
>             Project: Hadoop
>          Issue Type: Test
>          Components: test
>    Affects Versions: 0.15.0
>            Reporter: Mukund Madhugiri
>            Assignee: Mukund Madhugiri
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, 
> HADOOP-2000.patch, HADOOP-2000.patch
>
>
> The proposal is to re-write the NNBench benchmark/test to measure Namenode 
> operations using MapReduce. Two buckets of measurements will be done:
> 1. Transactions per second 
> 2. Average latency
> for these operations
> - Create and Close file
> - Open file
> - Rename file
> - Delete file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2000) Re-write NNBench to use MapReduce

Reply via email to