On 29 September 2011 18:39, lessonz <[email protected]> wrote: > I'm new to Hadoop, and I'm trying to understand the implications of a 64M > block size in the HDFS. Is there a good reference that enumerates the > implications of this decision and its effects on files stored in the system > as well as map-reduce jobs? > > Thanks. >
Good explanation of HDFS here: http://hadoop.apache.org/common/docs/current/hdfs_design.html In a nutshell MapReduce moves the computation to the node that hosts the data (block). As there is an overhead in startup/teardown of each task you want to make sure it has a reasonable amount of data to process, hence the default block size of 64MB. Quite a few users run at larger block sizes either as it's more efficient for their algorithmns or to reduce the overhead on the Name Node, more blocks = more meta-data to hold in the in-memory database. Hope that helps. Chris
