Re: Block Size

Chris Smith Thu, 29 Sep 2011 10:58:05 -0700

On 29 September 2011 18:39, lessonz <[email protected]> wrote:
> I'm new to Hadoop, and I'm trying to understand the implications of a 64M
> block size in the HDFS. Is there a good reference that enumerates the
> implications of this decision and its effects on files stored in the system
> as well as map-reduce jobs?
>
> Thanks.
>


Good explanation of HDFS here:
http://hadoop.apache.org/common/docs/current/hdfs_design.html

In a nutshell MapReduce moves the computation to the node that hosts
the data (block).

As there is an overhead in startup/teardown of each task you want to
make sure it has a reasonable amount of data to process, hence the
default block size of 64MB. Quite a few users run at larger block
sizes either as it's more efficient for their algorithmns or to reduce
the overhead on the Name Node, more blocks = more meta-data to hold in
the in-memory database.

Hope that helps.

Chris

Re: Block Size

Reply via email to