1. the test sequence is : start hadoop,start hbase,stop hbase,stop hadoop 2. the full context is in the files attached previously on this thread, is there anything else I am missing ? 3. fsck indeed fails (below). however there are no errors in the logs ! Exception in thread "main" java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.http.HttpClient.<init>(HttpClient.java:233) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.http.HttpClient.New(HttpClient.java:323) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977) at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137)
stack-3 wrote: > > Your hdfs looks ill. Its complaining a data file in -ROOT- catalog > table is 'missing'. What happens if you run '$HADOOP_HOME/bin/hadoop > fsck HBASE_HOMDIR'? More context around the errors would help with > analysis. You've tried restarting your HDFS? > > Thanks, > St.Ack > > > yoav.morag wrote: >> unfortunately, this is not the case :-( . I have installed NTP on the >> cluster , but the problem remains in exactly the same way. it is now >> clear >> from the logs, however, that the problem occurs in the master first : >> >> 2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager: >> /hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found >> in >> lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles >> >> and only then on the regionservers : >> >> 2008-09-28 11:05:02,848 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. >> Aborting... >> >> any more ideas will be greatly appreciated ... >> >> >> >> Jean-Daniel Cryans-2 wrote: >> >>> You maybe just found your problem, the clocks are not synchronized. It >>> is >>> a >>> requirement when using HBase to have synchronized clocks, see >>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html >>> >>> Thx for looking at it, >>> >>> J-D >>> >>> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <[EMAIL PROTECTED]> wrote: >>> >>> >>>> debug didn't seem to give much, as far as I could tell . i did however >>>> notice the following errors on hadoop log on the name node : >>>> I am attaching ( >>>> >>>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log >>>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log> >>>> >>>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log >>>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log> >>>> ) the full logs >>>> from the name node and one region servers (there are 4 , all with >>>> identical >>>> errors). note the clocks are not synchronized across the cluster, so >>>> the >>>> times in the logs can not be used to compare order between machines. >>>> >>>> suspicous errors : >>>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found >>>> in >>>> lease.paths >>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index, >>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186, >>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303]) >>>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not >>>> found >>>> in >>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186, >>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303]) >>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in >>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186, >>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303]) >>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in >>>> lease.paths >>>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303]) >>>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found >>>> in >>>> lease.paths >>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index, >>>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340]) >>>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not >>>> found >>>> in >>>> lease.paths >>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340]) >>>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager: >>>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in >>>> lease.paths >>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340]) >>>> 2 >>>> >>>> >>>> >>>> >>>> Jean-Daniel Cryans-2 wrote: >>>> >>>>> There is no other exceptions before that? Did you enable DEBUG? Can we >>>>> >>>> see >>>> >>>>> a >>>>> whole start/stop log of your region server? >>>>> >>>>> Thx, >>>>> >>>>> J-D >>>>> >>>>> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>> >>>>>> I am experiencing problems when restarting a cluster with >>>>>> hadoop/hbase >>>>>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with >>>>>> >>>> the >>>> >>>>>> message : >>>>>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020" >>>>>> java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> >>>>>> >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448) >>>> >>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>> strange enough, the said line appears to indicate log is null, >>>>>> however >>>>>> >>>> a >>>> >>>>>> log >>>>>> is created and messages are written into it... >>>>>> the restart scenario is very simple, and it happens even with a clean >>>>>> database , on a newly formatted FS. I have also checked no ghost >>>>>> processes >>>>>> exist before start. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;" >>>> >>>>>> >>>>>> >>>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;" >>>> >>>>>> any ideas ? >>>>>> -- >>>>>> View this message in context: >>>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html >>>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html >>>> Sent from the HBase User mailing list archive at Nabble.com. >>>> >>>> >>>> >>> >> >> > > > -- View this message in context: http://www.nabble.com/problem-restarting-0.18-tp19671584p19712887.html Sent from the HBase User mailing list archive at Nabble.com.