On Fri, Sep 24, 2010 at 9:06 PM, Ted Yu <[email protected]> wrote: > I see this log following the previous snippet: > > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block null bad datanode[0] nodes == null > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not get > block locations. Source file > "/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512" > - Aborting... > 2010-09-24 11:21:45,417 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in > abort
So we were aborting and the one thing we'll try to do on our way out when aborting is close the WAL log. Seems like that failed in the above. (This stuff is odd -- 'Recovery for block null bad datanode[0] nodes == null'... anything in your datanode logs to explain this? What if you grep the WAL log name in namenode log, do you see anything interesting?). > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512 > File does not exist. Holder DFSClient_302121899 does not have any open Hmm... says it does not exist. So, yeah, for sure, check out the namenode logs. Hey Ted, are you fellas running 0.20.x still? If so, what would it take to get you fellas up on 0.89, say the RC J-D put up today? > Would failure from hlog.close() lead to data loss ? > Are you not on 0.20 hbase still? If so, yes. If on 0.89 with an hadoop 0.20 with append support (Apache -append branch or CDH3b2), then some small amount may have been lost. St.Ack
