Inline

On Wed, Feb 2, 2011 at 5:35 AM, Raj V <[email protected]> wrote:

> Patrick,All
>
> APologies. My brain must have just frozen. There are two problems here. The
> first problem is that my perl script alreadfy divides the time by 1000. So
> the time is seconds not milliseconds.
> The second problem is  more fundamental. You can add all the task timings.,
> That ignores the inherently parallel aspect.
>
> Here is the updated table.
>
>      256 405420 751376 3483 332.1263279 1.297368468 256 411574 711841 3363
> 334.0514422 1.304888446 256 491519 599081 2955 369.0693739 1.441677242 512
> 491034 1212229 2989 569.8437605 1.112976095 512 471421 947025 2305
> 615.3778742 1.201909911 512 841000 1932633 4473 620.0833892 1.21110037
>
> Column 1 is the number of nodes in the cluster.
> Column 2 is the total map time in seconds.
> Column 3 is the total reduce time in seconds.
> Column 4 is the job time in seconds.
> Column 5 is (Column2+Column3)/Column4.
> Column6 is Column 5/Number of nodes.
>
> From this it appears that the infrastructure gives me around 20-40%
> advantage over a single node serail execution.. This does not include the
> other advantages such as failure tolereance and data availability.
>
> Questions
>
> 1. Is this the norm?
>
Possibly, for terasort. You are hitting a network bottleneck. Remember that
with terasort as much data goes through the shuffle/reduce phase as your
input data. Terasort isn't representative of most workloads, however, and is
more a test of your network throughput than disk I/O or CPU or anything
else.


> 2. Can I tune the system to get better numbers?
>
Not enough info here.


> 3, Are there others who can shed some thoughts on these or share data?
> 4 Any idea about teh loss of efficiency due to the infrastructure?
> 5. The 256 node cluster seems to provide an efficieny of 1.3 while 512
> nodes it decreases to around 1.2. Would this trend continue?
>
Yes, because of the network capacity.


>
> Raj
>
> Raj
>
>
>  *From:* Patrick Angeles <[email protected]>
> *To:* [email protected]
> *Cc:* "[email protected]" <[email protected]>
> *Sent:* Tuesday, February 1, 2011 7:24 PM
> *Subject:* Re: Hadoop Framework questions.
>  Hi, Raj.
>
> Interesting analysis...
>
> These numbers appear to be off. For example, 405s for mappers + 751s for
> reducers = 1156s for all tasks. If you have 2000 map and reduce tasks, this
> means each task is spending roughly 500ms to do actual work. That is a very
> low number and seems impossible.
>
> - P
>
>  On Wed, Feb 2, 2011 at 3:07 AM, Raj V <[email protected]> wrote:
>
>  HiI have been running some benchmarks with some hadoop jobs on different
> nodes and disk configurations to see what is a good configuration to get the
> optimum performance.Here are some results that I have. Using the hadoop job
> log, I added up the timings for each off the map task and reduce task and
> converted them to seconds by dividing by 1000. The job is terasort as
> provided in the examples.jar.Column 1 is the number of nodes in the
> cluster.Column 2 represents the sum of all the times map jobs for a task .
> i.e. Sum over all TaskID's ( FINISH_TIME-START_TIME) where
> TASK_TYPE=Map.Column 3 the same calculation for all the reduce jobs. In all
> the cases all the tasks were sucessfully completed.Column 4 is the
> LAUNCH_TIME-FINISH_TIME for the Job. There is a SETUP_TASK and CLEANUP_TASK
> that take insignificant times.Column 5 is the proporation of time taken by
> the tasks= ( Col2+Col3)/Col4And I am assuming that Total_Time -
> (MAP_Time+REDUCE_Time)  is basically the framework time.Now looking at the
> data I see that thge framework is taking ~38%-68% of the timingsHere are my
> questions.1. Is this a problem with my setup or is this is the normal
> behaviour?2. What can I do to reduce to the time taken for everything
> else?3. Does this mean  that to get a reasonably efficient hadoop cluster (
> 10%-20% time for framework), do I need to get to a 1000 node cluster?4. What
> is the normal "number" for the framework time ?I apologize for1, Cross
> posting. I am using CDH3B3 but I don't think my questions are specific to
> CDH3B3.2. Not having provided all the details of system,network, and disk
> configuration. 3. The different jobs have different disk configurations.
> Most of them are running about 2000 map and reduce jobs.
>      256 405.42 751.376 3483 0.332126328 256 411.574 711.841 3363
> 0.334051442 256 491.519 599.081 2955 0.369069374 512 491.034 1212.229 2989
> 0.56984376 512 471.421 947.025 2305 0.615377874 512 841 1932.633 4473
> 0.620083389
>
>

Reply via email to