Allen,

How you doing? Heard finally moving away from Solaris and moving to linux :)
Hope things are going well for you !


I think I found the source of my problems, The issue is in Amazon EC2 when I
start my cluster (1 namenode, 16 datanodes) datanodes are not able to talk
to namenode at all (I tried telnet from datanode to namenode) and it gets
fixed progressively and magically in about 30-40 mins when all of them to be
able to talk and hence the safemode taking 40 mins.

We are running secondary namenode and do regular scps to safe guard the
data.

Best
Bhupesh


On Fri, Jun 11, 2010 at 5:57 PM, Allen Wittenauer [via Lucene] <
[email protected]<ml-node%[email protected]>
> wrote:

>
> (removing hadoop-u...@lucene)
>
> On Jun 11, 2010, at 5:08 PM, Bhupesh Bansal wrote:
>
> > I am also seeing similar issues, I am not clear how will the secondary
> name
> > node helps here ?
> > AFAIK secondary namenode checkpoints and saves namenode snapshots
> > periodically and namenode
> > do not check with secondary namenode for any data inconsistencies.
>
>
> You can copy the checkpoint over to the primary.  This is better than no
> backup at all. :)
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/HDFS-safemode-recovery-take-more-than-an-hour-tp784779p889956.html
> To unsubscribe from Re: HDFS safemode recovery take more than an hour, click
> here< (link removed) >.
>
>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/HDFS-safemode-recovery-take-more-than-an-hour-tp784779p889964.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Reply via email to