Hi Moon,
The periodic block report is constructed entirely from info in memory, so
there is no complete scan of the filesystem for this purpose.  The periodic
block report defaults to only sending once per hour from each datanode, and
each DN calculates a random start time for the hourly cycle (after initial
startup block report), to spread those hourly reports somewhat evenly across
the entire hour.  It is part of Hadoop's fault tolerance that the namenode
and datanodes perform this hourly check to assure that they both have the
same understanding of what replicas are available from each node.

However, depending on your answers to the questions below, you may be having
memory management and/or garbage collection problems.  We may be able to
help diagnose it if you can provide more info:

First, please confirm that you said 50,000,000 blocks per datanode (not
50,000).  This is a lot.  The data centers I'm most familiar with run with
aprx 100,000 blocks per datanode, because they need a higher ratio of
compute power to data.

Second, please confirm whether it is the datanodes, or the namenode
services, that are being non-responsive for minutes at a time.  And when you
say "often", how often are you experiencing such non-responsiveness?  What
are you experiencing when it happens?

Regarding your environment:
* How many datanodes in the cluster?
* How many volumes (physical HDD's) per datanode?
* How much RAM per datanode?
* What OS on the datanodes, and is it 32-bit or 64-bit?  What max process
size is configured?
* Is the datanode services JVM running as 32-bit or 64-bit?

Hopefully these answers will help figure out what's going on.
--Matt


On Fri, Jul 8, 2011 at 7:21 AM, Robert Evans <[email protected]> wrote:

> Moon Soo Lee
>
> The full block report is used in error cases.  Currently when a datanode
> heartbeats into the namenode the namenode can send back a list of tasks to
> be preformed, this is mostly for deleting blocks.  The namenode just assumes
> that all of these tasks execute successfully.  If any of them fail then the
> namenode is unaware of it.  HDFS-395 adds in an ack to address this.
>  Creating of new blocks is sent to the namenode as they happen so this is
> not really an issue. So if you set the period to 1 year then you will likely
> have several blocks in your cluster sitting around unused but taking up
> space.  It is also likely compensating for other error conditions or even
> bugs in HDFS that I am unaware of, just because of the nature of it.
>
> --Bobby Evans
>
> On 7/7/11 9:02 PM, "moon soo Lee" <[email protected]> wrote:
>
> I have many blocks. Around 50~90m each datanode.
>
> They often do not respond while 1~3 min and i think this is because of full
> scanning for block report.
>
> So if i set dfs.blockreport.intervalMsec to very large value (1year or
> more?), i expect problem clear.
>
> But if i really do what happens? any side effects?
>
>

Reply via email to