On Tue, Jul 28, 2009 at 12:15 PM, william kinney <[email protected]>wrote:
> > Also, from the job page (different job, same Map method, just more > data...~40GB. 781 files): > Map input records 629,738,080 > Map input bytes 41,538,992,880 > > Anything else I can look into? Yes. The number of data local maps and how many maps total. > Do my original numbers (only 2x performance) jump out at you as being > way off? Or it is common to see that a setup similar to mine? It is way off. My experience is that from 5 EC2 nodes, I can sustain 100-200MB / s to the *network*. These are lesser machines than you have and you have twice as many. Moreover, your test program is nicely designed to avoid all of the overhead attendant on running a full program. It is reasonable to expect significant slow down due to startup and due to going through HDFS, but for local blocks I would expect good performance. Is it possible that the 50MB/s on a single node was not a real number? It seems somewhat high but probably reasonable with modern hardware. Was the file already in memory? -- Ted Dunning, CTO DeepDyve
