Re: datanode auto down

mingyang Fri, 17 Jul 2009 06:46:03 -0700

I run this, the result is far less than eight minutes, is not where I set up
right? I just changed the fs.default.name, other settings are default values


   <name> fs.default.name </ name>
   <value> hdfs: / / named: 9000 </ value>



   <name> mapred.job.tracker </ name>
   <value> hdfs: / / named: 9001 </ value>



   <name> hadoop.tmp.dir </ name>
   <value> / hadoop / tmp </ value>

   <name> dfs.name.dir </ name>
   <value> / hadoop / name </ value>


   <name> dfs.data.dir </ name>
   <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l </ value>



   <name> dfs.replication </ name>
   <value> 1 </ value>


   <name> dfs.block.size </ name>
   <value> 2560000 </ value>



time find / b / c / d / e / f / g / h / i / j / k / l-type f-print> / dev /
null

real 0m4.608s
user 0m1.489s
sys 0m3.068s


2009/7/17 jason hadoop <[email protected]>

> From the shell,
> time find /b /c /d /e /f /g /h /i /j /k /l -type f -print > /dev/null
>
> Also, unless your distribution is magic, white space is not ignored in
> value
> statements. Hopefully your actual value is
> <value>/b,/c,/d,/e,/f,/g,/h,/i,/j,/k,/l</value>
>
> On Fri, Jul 17, 2009 at 6:29 AM, mingyang <[email protected]> wrote:
>
> > Thank you for your reply, I am novice, I do not quite understand how to
> > check dfs.data.dir more than eight minutes?
> > My settings are dfs.data.dir
> > <property>
> >   <name> dfs.data.dir </ name>
> >   <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l </ value>
> >   </ property>
> >
> > Each directory has a disk mount
> >
> > 2009/7/17 jason hadoop <[email protected]>
> >
> > > Most likely problem is that the block report is taking more than 10
> > > minutes.
> > > Due to the placement of the sync blocks in the core Datanode code, the
> > > block
> > > report locks out the heartbeat.
> > > This can cause the namenode to think the datanode has vanished.
> > >
> > > A simple way to check, is to run a find on the directory set specified
> > for
> > > dfs.data.dir. If this find takes more than 8 minutes or so, you are in
> > > trouble.
> > > The only solutions are to add more datanodes, reducing the block count,
> > or
> > > increase your system io speed so that the block report may complete in
> > > time.
> > >
> > > On Fri, Jul 17, 2009 at 6:12 AM, mingyang <[email protected]>
> wrote:
> > >
> > > > i using hadoop storage my media files，, but when the number of
> > documents
> > > > when more than one million,
> > > > Hadoop start about 10-20 minutes, my datanode automatically down,
> > > > namenode log shows that the loss of heart, but I see my normal
> > datanode,
> > > > port 50010 can be a normal telnet, use jps to see can see datanode
> > still
> > > > running, but at this time have been unable to put data to a hadoop, I
> > > guess
> > > > datanode services is dead, hadoop does not support more than one
> > million
> > > > documents? How do I adjust those parameters? I have already set up at
> > the
> > > > same time the number of open file 65535
> > > >
> > > >
> > > > namenode log
> > > >
> > > > 2009-07-17 18:14:29,330 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> > > >
> > > > ugi=root,root,bin,daemon,sys,adm,disk,wheelip=/192.168.1.96
> > > > cmd=setPermission
> > > >
> > > > src=/hadoop/tmp/mapred/system/jobtracker.info   dst=null
> > > > perm=root:supergroup
> > > > :rw-------
> > > > 2009-07-17 18:14:29,336 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.allocateBlock:
> > > >
> > > > /hadoop/tmp/mapred/system/jobtrack
> > > > er.info. blk_-2148480138731090754_1403179
> > > > 2009-07-17 18:14:32,958 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.addStoredBlock: blockMap updated:
> > > >
> > > > 192.168.1.97:50
> > > > 010 is added to blk_-2148480138731090754_1403179 size 4
> > > > 2009-07-17 18:14:33,340 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> > > > NameSystem.completeFile: file
> > > >
> > > > /hadoop/tmp/mapred/system/jobtra
> > > > cker.info is closed by DFSClient_1037557306
> > > > 2009-07-17 18:16:21,349 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
> from
> > > > 192.168.1.96
> > > > 2009-07-17 18:16:21,349 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > > > transactions:
> > > > 7 Total time for
> > > >
> > > > transacti
> > > > ons(ms): 1Number of transactions batched in Syncs: 1 Number of syncs:
> 6
> > > > SyncTimes(ms): 9
> > > > 2009-07-17 18:17:12,171 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage
> from
> > > > 192.168.1.96
> > > > 2009-07-17 18:17:12,171 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > > > transactions:
> > > > 0 Total time for
> > > >
> > > > transacti
> > > > ons(ms): 0Number of transactions batched in Syncs: 0 Number of syncs:
> 1
> > > > SyncTimes(ms): 0
> > > > 2009-07-17 18:51:00,566 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.heartbeatCheck: lost heartbeat from
> > > >
> > > > 192.168.1.97:
> > > > 50010
> > > > 2009-07-17 18:51:25,383 INFO org.apache.hadoop.net.NetworkTopology:
> > > > Removing
> > > > a node: /default-rack/192.168.1.97:50010
> > > > 2009-07-17 19:10:48,564 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.
> > >  Holder:
> > > > DFSClient_-
> > > >
> > > > 1624377199, pend
> > > > ingcreates: 69] has expired hard limit
> > > > 2009-07-17 19:10:48,564 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
> > > > lease=[Lease.  Holder:
> > > >
> > > > DFSClient_-16243
> > > > 77199, pendingcreates: 69],
> > > >
> src=/unp/01/video/B3/94/{B394EDB2-0302-34B9-5357-4904FFFEFF36}_100.unp
> > > >
> > > > datanode log
> > > >
> > > > 2009-07-17 18:52:40,719 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_664105388033641
> > > > 1514_601647
> > > > 2009-07-17 18:52:12,421 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_-10747966535898
> > > > 19594_1392025
> > > > 2009-07-17 18:51:44,074 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_-63504301593802
> > > > 31402_155334
> > > > 2009-07-17 18:51:12,760 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_460729998775184
> > > > 5359_395290
> > > > 2009-07-17 18:50:39,977 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_802918354954113
> > > > 9011_474989
> > > > 2009-07-17 18:50:11,707 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_846865664811904
> > > > 9754_1065465
> > > > 2009-07-17 18:49:39,421 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_473953565994615
> > > > 8302_532204
> > > > 2009-07-17 18:49:11,213 INFO
> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > > > succeeded for blk_-14950858387931
> > > > 09024_354553
> > > >
> > > > 09/07/17 18:02:09 WARN hdfs.DFSClient: DFSOutputStream
> > ResponseProcessor
> > > > exception  for block
> > > > blk_-2536746364442878375_1403164java.net.SocketTimeoutException:
> 63000
> > > > millis timeout while waiting for channel to be ready for read. ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54783
> > > remote=/
> > > > 192.168.1.97:50010]
> > > > 11096473        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
> > > > createBlockOutputStream java.net.SocketTimeoutException: 63000 millis
> > > > timeout while waiting for channel to be ready for read. ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54790
> > > remote=/
> > > > 192.168.1.97:50010]
> > > > 11096475        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
> > > > createBlockOutputStream java.net.SocketTimeoutException: 63000 millis
> > > > timeout while waiting for channel to be ready for read. ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54791
> > > remote=/
> > > > 192.168.1.97:50010]
> > > >
> > >
> > >
> > >
> > > --
> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
> >
> >
> > --
> > 致
> > 礼！
> >
> >
> > 王明阳
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
致
礼！


王明阳

Re: datanode auto down

Reply via email to