the page i'm referring to is:
http://wiki.apache.org/hadoop/NameNode
- Manish
Co-Founder Rapleaf.com
We're looking for a product manager, sys admin, and software
engineers...$10K referral award
On Aug 12, 2008, at 12:07 PM, lohit wrote:
Hi Manish,
- why 15+ GBs? Do we allocate all memory to the NameNode? or
just allocate some number using -Xmx and leave the rest available so
the machine doesnt start swapping?
We allocated memory using -Xmx. NameNode stores the HDFS namespace
in memory, so, the bigger your namespace, the bigger would be your
heap. My guess is that if you have more than 15 million files with
20 million blocks you might need such a big system. But again, its
best to see how your namenode is performing and how much memory it
is consuming.
- why RAID5?
- If running RAID 5, why is this necessary?
Not absolute necessary. So, the namenode index or metadata is
critical piece of data. You cannot afford to lose or corrupt it.
That is the reason, we have an option of specifying multiple
directories to have different copies in parallel. You could
configure the directories to whatever you would like it to be.
Multiple drives, NFS....
- Configure the name node to store one set of transaction logs on a
separate disk from the index.
why?
This feature is not yet supported, but a good one to have. Right
now both transaction logs and index (I am assuming this means
image) are in same directory and cannot to be configured to be
placed in separate directories. We should correct the wiki.
- Configure the name node to store another set of transaction logs to
a network mounted disk.
- why?
As explained above, this is to have multiple copies of your
metadata (dfs.name.dir in particular)
- Do not host DataNode, JobTracker or TaskTracker services on the
same system.
typically Datanode and TaskTracker are run on all nodes while
JobTracker is run on dedicated node like NameNode (SecondaryNameNode).
Sometimes, TaskTracker might crash and bring down a node and you do
not want your JobTracker or NameNode to be on that system.
PS: Could you point to the wiki you are referring to? We might need
to make some corrections.
Thanks,
Lohit
----- Original Message ----
From: Manish Shah <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, August 12, 2008 11:24:45 AM
Subject: NameNode hardware specs
Can someone help explain in a little more detail some of the reasons
for the hardware specs that were recently added to the wiki for the
NameNode. I guess i'm interested in learning how others have settled
on these specs? Is it by observed behavior, or just recommended by
other hadoop users?
- Use a good server with lots (15GB+) of RAM.
- why 15+ GBs? Do we allocate all memory to the NameNode? or
just allocate some number using -Xmx and leave the rest available so
the machine doesnt start swapping?
- Consider using fast RAID5 storage for keeping the index.
- why RAID5?
- List more than one name node directory in the configuration, so
that multiple copies of the indices will be stored. As long as the
directories are on separate disks, a single full disk will not
corrupt the index.
- If running RAID 5, why is this necessary?
- Configure the name node to store one set of transaction logs on a
separate disk from the index.
- why?
- Configure the name node to store another set of transaction logs to
a network mounted disk.
- why?
- Do not host DataNode, JobTracker or TaskTracker services on the
same system.
- how much memory would the job tracker need? Does it use a
lot of CPU? In general, what are good specs for a job tracker machine
and can the machine be shared with other services?
Thanks so much for the help. I think it would be hugely helpful for
the community to start describing their respective setups for hadoop
clusters in more detail than just the config for datanodes and
cluster size. I think we all want to be confident that we are
spending money on the right machines to grow our cluster the right
way.
Most appreciated,
- Manish
Co-Founder Rapleaf.com
We're looking for a product manager, sys admin, and software
engineers...$10K referral award