Hi Kuro, A 100MB file should take 1 second to read; typically, MR jobs get scheduled on the order of seconds. So, it's unlikely you'll see any benefit.
You'll probably want to have a look at Amdahl's law: http://en.wikipedia.org/wiki/Amdahl%27s_law Brian On Aug 31, 2011, at 3:48 AM, Teruhiko Kurosaka wrote: > Hadoop newbie here. > > I wrapped my company's entity extraction product in a Hadoop task, > and give it a large file of the magnitude of 100MB. > I have 4 VMs running on a 24-core CPU server, and made two of > them the slave nodes, one namenode and another job tracker. > It turned out that processing the same data size takes longer > using Hadoop than processing it in serial. > > I am curious that how I can experience the advantage of > Hadoop. Is having many physical machines essential? > Would I need to process Terabytes of data? What would be > the minimum set up where I can experience the advantage > of Hadoop? > ---- > T. "Kuro" Kurosaka
smime.p7s
Description: S/MIME cryptographic signature
