1. the test sequence is : start hadoop,start hbase,stop hbase,stop hadoop
2. the full context is in the files attached previously on this thread, is
there anything else I am missing ? 
3. fsck indeed fails (below). however there are no errors in the logs ! 
Exception in thread "main" java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.Socket.connect(Socket.java:519)
        at java.net.Socket.connect(Socket.java:469)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
        at sun.net.www.http.HttpClient.New(HttpClient.java:306)
        at sun.net.www.http.HttpClient.New(HttpClient.java:323)
        at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
        at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
        at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
        at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137)




stack-3 wrote:
> 
> Your hdfs looks ill.  Its complaining a data file in -ROOT- catalog 
> table is 'missing'.  What happens if you run '$HADOOP_HOME/bin/hadoop 
> fsck HBASE_HOMDIR'?  More context around the errors would help with 
> analysis.   You've tried restarting your HDFS?
> 
> Thanks,
> St.Ack
> 
> 
> yoav.morag wrote:
>>  unfortunately, this is not the case :-( . I have installed NTP on the
>> cluster , but the problem remains in exactly the same way. it is now
>> clear
>> from the logs, however, that the problem occurs in the master first : 
>>
>> 2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found
>> in
>> lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles
>>
>> and only then on the regionservers : 
>>
>> 2008-09-28 11:05:02,848 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
>> Aborting...
>>
>> any more ideas will be greatly appreciated ... 
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>>   
>>> You maybe just found your problem, the clocks are not synchronized. It
>>> is
>>> a
>>> requirement when using HBase to have synchronized clocks, see
>>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html
>>>
>>> Thx for looking at it,
>>>
>>> J-D
>>>
>>> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <[EMAIL PROTECTED]> wrote:
>>>
>>>     
>>>> debug  didn't seem to give much, as far as I could tell . i did however
>>>> notice the following errors on hadoop log on the name node :
>>>> I am attaching (
>>>>
>>>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
>>>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>>>>
>>>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
>>>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log>
>>>> ) the full logs
>>>> from the name node and one region servers (there are 4 , all with
>>>> identical
>>>> errors). note the clocks are not synchronized across the cluster, so
>>>> the
>>>> times in the logs can not be used to compare order between machines.
>>>>
>>>> suspicous errors :
>>>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
>>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not
>>>> found
>>>> in
>>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
>>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in
>>>> lease.paths
>>>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
>>>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not
>>>> found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
>>>> lease.paths
>>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2
>>>>
>>>>
>>>>
>>>>
>>>> Jean-Daniel Cryans-2 wrote:
>>>>       
>>>>> There is no other exceptions before that? Did you enable DEBUG? Can we
>>>>>         
>>>> see
>>>>       
>>>>> a
>>>>> whole start/stop log of your region server?
>>>>>
>>>>> Thx,
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>         
>>>>>> I am experiencing problems when restarting a cluster with
>>>>>> hadoop/hbase
>>>>>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
>>>>>>           
>>>> the
>>>>       
>>>>>> message :
>>>>>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>>>>>> java.lang.NullPointerException
>>>>>>        at
>>>>>>
>>>>>>
>>>>>>           
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>>>>       
>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>> strange enough, the said line appears to indicate log is null,
>>>>>> however
>>>>>>           
>>>> a
>>>>       
>>>>>> log
>>>>>> is created and messages are written into it...
>>>>>> the restart scenario is very simple, and it happens even with a clean
>>>>>> database , on a newly formatted FS. I have also checked no ghost
>>>>>> processes
>>>>>> exist before start.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>>>>       
>>>>>>
>>>>>>           
>>>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>>>>       
>>>>>> any ideas ?
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>>           
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>       
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/problem-restarting-0.18-tp19671584p19712887.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to