[ 
https://issues.apache.org/jira/browse/HBASE-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658614#action_12658614
 ] 

stack commented on HBASE-1080:
------------------------------

I see in logs that things in master seemed to be running fine scanning root and 
meta and then it just stops scanning meta.

{code}
2008-12-20 17:12:59,065 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in 
table locations for row <> and tableName .META.: location server  
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:13:22,458 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scanning meta region {regionname: .META.,,1, 
startKey: <>, server:  XX.XX.XX.216:60020}
2008-12-20 17:13:22,573 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scan of 378 row(s) of meta region {regionname: 
.META.,,1, startKey: <>, server:  XX.XX.XX.216:60020} complete
2008-12-20 17:13:22,573 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 
.META. region(s) scanned
2008-12-20 17:13:27,781 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0, 
startKey: <>, server:  XX.XX.XX.213:60020}
2008-12-20 17:13:27,789 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname: 
-ROOT-,,0, startKey: <>, server:  XX.XX.XX.213:60020} complete
2008-12-20 17:13:29,417 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in 
table locations for row <> and tableName .META.: location server  
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:13:51,564 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Total Load: 382, Num Servers: 10, Avg Load: 39.0
2008-12-20 17:13:59,763 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in 
table locations for row <> and tableName .META.: location server  
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:14:22,458 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scanning meta region {regionname: .META.,,1, 
startKey: <>, server:  XX.XX.XX.216:60020}
2008-12-20 17:14:27,781 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0, 
startKey: <>, server:  XX.XX.XX.213:60020}
2008-12-20 17:14:27,787 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname: 
-ROOT-,,0, startKey: <>, server: XX.XX.XX.213:60020} complete
2008-12-20 17:14:51,591 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Total Load: 382, Num Servers: 10, Avg Load: 39.0
...
{code}

...


This goes on then forever except the meta scanner runs twice more 30 minutes 
later on schedule and then stops again... for the rest of the log.

Root and meta are on different servers, 213 and 216 respectively.

Looking at logs on server hosting meta, I see nothing untoward (though seems 
like we're major compacting every 4 hours or so) and then lease's timing out 
without explanation.

{code}
2008-12-20 16:34:24,021 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
starting compaction on region assigners,,1229364037757
2008-12-20 16:34:24,022 DEBUG org.apache.hadoop.hbase.regionserver.HStore: 
Skipping major compaction because only one (major) compacted file only and 
elapsedTime 169999846 is < ttl=-1
2008-12-20 16:34:24,022 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
compaction completed on region assigners,,1229364037757 in 0sec
2008-12-20 17:15:01,773 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-5917415078002522486 lease expired
2008-12-20 17:15:22,463 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7764136277533139761 
lease expired
2008-12-20 17:23:21,423 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2005207420919280312 
lease expired
2008-12-20 17:45:31,699 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 6223121878865388268 
lease expired
2008-12-20 17:46:15,298 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7574816364936285182 
lease expired
2008-12-20 17:46:32,666 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 908654114835311904 
lease expired
2008-12-20 19:21:03,933 DEBUG org.apache.hadoop.hbase.regionserver.HStore: 
Major compaction triggered on store: 170018656/content. Time since last major 
compaction: 309999 seconds
...
{code}

> Deadlocked master; wants to assign root but can't because root is not assigned
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-1080
>                 URL: https://issues.apache.org/jira/browse/HBASE-1080
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 17:05 < jgray> I0.19.0-dev, r726565
> 17:05 < jgray> 12/15/08 i grabbed it
>            Reporter: stack
>         Attachments: master.dump.log
>
>
> This lock assigning regions looks broad.
> {code}
> "IPC Server handler 6 on 60000" daemon prio=10 tid=0x00007ff2d00ab400 
> nid=0x645b in Object.wait() [0x000000004330b000..0x000000004330cd70]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Object.wait(Object.java:502)
>       at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:695)
>       - locked <0x00007ff2e8e2b3b0> (a 
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
>       at $Proxy2.batchUpdates(Unknown Source)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:916)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:914)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerForWithoutRetries(HConnectionManager.java:872)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:913)
>       at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1270)
>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1241)
>       - locked <0x00007ff2e8c01b90> (a org.apache.hadoop.hbase.client.HTable)
>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1221)
>       - locked <0x00007ff2e8c01b90> (a org.apache.hadoop.hbase.client.HTable)
>       at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:239)
>       at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:218)
>       at 
> org.apache.hadoop.hbase.RegionHistorian.addRegionAssignment(RegionHistorian.java:142)
>       at 
> org.apache.hadoop.hbase.master.RegionManager.assignRegionsToMultipleServers(RegionManager.java:282)
>       at 
> org.apache.hadoop.hbase.master.RegionManager.assignRegions(RegionManager.java:220)
>       - locked <0x00007ff2e895d3f8> (a 
> java.util.Collections$SynchronizedSortedMap)
>       at 
> org.apache.hadoop.hbase.master.ServerManager.processMsgs(ServerManager.java:382)
>       at 
> org.apache.hadoop.hbase.master.ServerManager.processRegionServerAllsWell(ServerManager.java:324)
>       at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:240)
>       at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:570)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:892)
> ....
> {code}
> Its messing up assigning root it seems.
> We are stuck in here.  Doesn't look like we'll break out though maybe we time 
> out?
> {code}
>   public Writable call(Writable param, InetSocketAddress addr, 
>                        UserGroupInformation ticket)  
>                        throws InterruptedException, IOException {
>     Call call = new Call(param);
>     Connection connection = getConnection(addr, ticket, call);
>     connection.sendParam(call);                 // send the parameter
>     synchronized (call) {
>       while (!call.done) {
>         try {
>           call.wait();                           // wait for the result
>         } catch (InterruptedException ignored) {}
>       }
> ...
> {code}
> ... down in HBaseClient. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to