Hi Harsh,

Thanks for the info.  If the replication is set to 2, will there be
any difference in performance when running MR jobs?

On Wed, Feb 1, 2012 at 1:02 PM, Harsh J <ha...@cloudera.com> wrote:
> (Total configured space / replication factor), which if you choose
> your values and apply it for the whole FS: ((500 GB x 5) / 3 rep
> factor) = (2.5 TB / 3 rep factor) = 833 GB.
>
> Note, however, that replication is a per-file property and you can
> control it granularly instead of keeping it constant FS-wide, if need
> be. Use the setrep utility:
> http://hadoop.apache.org/common/docs/current/file_system_shell.html#setrep.
> For instance, you can keep non-critical files with 1 (none) or 2
> replicas, and all important ones with 3. The calculation of usable
> space hence becomes a more complex function.
>
> Also, for 5 nodes, using a replication factor of two may be okay too.
> This will let you bear one DN failure at a time, while 3 will let you
> bear two DN failures at the same time (unsure if you'll need that,
> since a power or switch loss in your case would mean the whole cluster
> going down anyway). You can up the replication factor once you grow
> higher, and rebalance the cluster to get it properly functional again.
> With rep=2, you should have 1.2 TB worth of usable space.
>
> On Wed, Feb 1, 2012 at 9:06 AM, Michael Lok <fula...@gmail.com> wrote:
>> Hi folks,
>>
>> We're planning to setup a 5 node hadoop cluster. I'm thinking of just
>> setting the dfs.replication to 3; which is the default. Each data node will
>> have 500gb of local storage for dfs use.
>>
>> How do i calculate the amount of usable dfs space given the replication
>> setting and the number of nodes in this case?  is there a formula which i
>> can use?
>>
>> Any help is greatly appreciated.
>>
>> Thanks
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about

Reply via email to