Practical limits on number of blocks per datanode.

Rick Hangartner Thu, 20 Nov 2008 22:24:58 -0800

Hi,

We are in the midst of considering Hadoop as a prototype solution fora system we are building. In the abstract Hadoop and MapReduce arevery well-suited to our computational problem. However, this emailexchange has caused us some concern that we are hoping the usercommunity might allay. We've searched JIRA for relevant issues butdidn't turn up anything. (We probably aren't as adept as we might beat surfacing appropriate items though.)

Here are the relevant numbers for the data we are using to prototype asystem using Hadoop 0.18.1:

We have 16,000,000 files that are 10K each, or about 160GB total. Wehave 10 datanodes with the default replication factor of 3. Eachfile will probably be stored as a single block, right? This means wewould be storing 48,000,000 blocks on 10 datanodes or 4,800,000 blocksper node.

At 160GB, the total data is not particularly large. Unfortunately,the attached email exchange suggests we could have a problem with thelarge number of blocks per node. We have considered combining anumber of small files into larger files (say concatenating sets of100 files into single larger files so we have 48,000 blocks that are1MB in size per node.) This would not significantly effect ourMapReduce algorithm, but it could undesirably complicate othercomponents of the system that use this data.

Thanks in advance for any insights on the match between Hadoop (0.18.xand later) and our particular system requirements.


RDH

Begin forwarded message:

From: Konstantin Shvachko <[EMAIL PROTECTED]>
Date: November 17, 2008 6:27:42 PM PST
To: [email protected]
Subject: Re: The Case of a Long Running Hadoop System
Reply-To: [email protected]

Bagri,
According to the numbers you posted your cluster has 6,000,000 blockreplicasand only 12 data-nodes. The blocks are small on average about 78KBaccording
to fsck. So each node contains about 40GB worth of block data.
But the number of blocks is really huge 500,000 per node. Is my mathcorrect?
I haven't seen data-nodes that big yet.
The problem here is that a data-node keeps a map of all its blocksin memory.The map is a HashMap. With 500,000 entries you can get long lookuptimes I guess.
And also block reports can take long time.

So I believe restarting name-node will not help you.
You should somehow pack your small files into larger ones.
Alternatively, you can increase your cluster size, probably 5 to 10times larger.I don't remember whether we had any optimization patches related todata-nodes
block map since 0.15. Please advise if anybody remembers.

Thanks,
--Konstantin


Abhijit Bagri wrote:
We do not have a secondary namenode because 0.15.3 has serious bugwhich truncates the namenode image if there is a failure whilenamenode fetches image from secondary namenode. See HADOOP-3069I have a patched version of 0.15.3 for this issue. From the patchof HADOOP-3069, the changes are on namenode _and_ secondarynamenode, which means I just cant fire up a seconday namenode.
- Bagri
On Nov 15, 2008, at 11:36 PM, Billy Pearson wrote:
If I understand the secondary namenode merges the edits log in tothe fsimage and reduces the edit log size.Which is likely the root of your problems 8.5G seams large andlikely putting a strain on your master servers memory and iobandwidth
Why do you not have a secondary namenode?
If you do not have the memory on the master I would look in tostopping a datanode/tasktracker on a server and loading thesecondary namenode on it
Let it run for a while and watch your log for the secondarynamenode you should see your edit log get smaller
I am not an expert but that would be my first action.

Billy

Practical limits on number of blocks per datanode.

Reply via email to