On Jul 15, 2010, at 11:40 AM, Syed Wasti wrote: > Will it matter what the data block size is ?
Yes. > It is recommended to have a block size of 64 MB, but if we want to have the > data block size to 128 MB, should this effect the performance ? Yes. FWIW, we run with 128MB. > Does the size of the map jobs created on each datanodes in anyway depend the > block size ? Yes. Unless told otherwise, Hadoop will generally use the # of maps == # of blocks. So if you have fewer blocks to process, you'll have fewer maps to do more work. This is not necessarily a bad thing; it all depends upon your workload, size of grid, etc.
