Hi Kuro,

A 100MB file should take 1 second to read; typically, MR jobs get scheduled on 
the order of seconds.  So, it's unlikely you'll see any benefit.

You'll probably want to have a look at Amdahl's law:

http://en.wikipedia.org/wiki/Amdahl%27s_law

Brian

On Aug 31, 2011, at 3:48 AM, Teruhiko Kurosaka wrote:

> Hadoop newbie here.
> 
> I wrapped my company's entity extraction product in a Hadoop task,
> and give it a large file of the magnitude of 100MB.
> I have 4 VMs running on a 24-core CPU server, and made two of
> them the slave nodes, one namenode and another job tracker.
> It turned out that processing the same data size takes longer
> using Hadoop than processing it in serial.
> 
> I am curious that how I can experience the advantage of
> Hadoop.  Is having many physical machines essential?
> Would I need to process Terabytes of data? What would be
> the minimum set up where I can experience the advantage
> of Hadoop?
> ----
> T. "Kuro" Kurosaka

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to