Re: Map performance with custom binary format

Ted Dunning Tue, 28 Jul 2009 13:26:39 -0700

On Tue, Jul 28, 2009 at 12:15 PM, william kinney
<[email protected]>wrote:


>
> Also, from the job page (different job, same Map method, just more
> data...~40GB. 781 files):
> Map input records       629,738,080
> Map input bytes         41,538,992,880
>
> Anything else I can look into?


Yes.  The number of data local maps and how many maps total.


> Do my original numbers (only 2x performance) jump out at you as being
> way off? Or it is common to see that a setup similar to mine?


It is way off.  My experience is that from 5 EC2 nodes, I can sustain
100-200MB / s to the *network*.  These are lesser machines than you have and
you have twice as many.  Moreover, your test program is nicely designed to
avoid all of the overhead attendant on running a full program.  It is
reasonable to expect significant slow down due to startup and due to going
through HDFS, but for local blocks I would expect good performance.

Is it possible that the 50MB/s on a single node was not a real number?  It
seems somewhat high but probably reasonable with modern hardware.  Was the
file already in memory?


-- 
Ted Dunning, CTO
DeepDyve

Re: Map performance with custom binary format

Reply via email to