[
https://issues.apache.org/jira/browse/HBASE-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658614#action_12658614
]
stack commented on HBASE-1080:
------------------------------
I see in logs that things in master seemed to be running fine scanning root and
meta and then it just stops scanning meta.
{code}
2008-12-20 17:12:59,065 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in
table locations for row <> and tableName .META.: location server
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:13:22,458 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: XX.XX.XX.216:60020}
2008-12-20 17:13:22,573 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 378 row(s) of meta region {regionname:
.META.,,1, startKey: <>, server: XX.XX.XX.216:60020} complete
2008-12-20 17:13:22,573 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1
.META. region(s) scanned
2008-12-20 17:13:27,781 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: XX.XX.XX.213:60020}
2008-12-20 17:13:27,789 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname:
-ROOT-,,0, startKey: <>, server: XX.XX.XX.213:60020} complete
2008-12-20 17:13:29,417 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in
table locations for row <> and tableName .META.: location server
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:13:51,564 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 382, Num Servers: 10, Avg Load: 39.0
2008-12-20 17:13:59,763 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in
table locations for row <> and tableName .META.: location server
XX.XX.XX.216:60020, location region name .META.,,1
2008-12-20 17:14:22,458 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: XX.XX.XX.216:60020}
2008-12-20 17:14:27,781 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: XX.XX.XX.213:60020}
2008-12-20 17:14:27,787 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname:
-ROOT-,,0, startKey: <>, server: XX.XX.XX.213:60020} complete
2008-12-20 17:14:51,591 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 382, Num Servers: 10, Avg Load: 39.0
...
{code}
...
This goes on then forever except the meta scanner runs twice more 30 minutes
later on schedule and then stops again... for the rest of the log.
Root and meta are on different servers, 213 and 216 respectively.
Looking at logs on server hosting meta, I see nothing untoward (though seems
like we're major compacting every 4 hours or so) and then lease's timing out
without explanation.
{code}
2008-12-20 16:34:24,021 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region assigners,,1229364037757
2008-12-20 16:34:24,022 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Skipping major compaction because only one (major) compacted file only and
elapsedTime 169999846 is < ttl=-1
2008-12-20 16:34:24,022 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region assigners,,1229364037757 in 0sec
2008-12-20 17:15:01,773 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-5917415078002522486 lease expired
2008-12-20 17:15:22,463 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7764136277533139761
lease expired
2008-12-20 17:23:21,423 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2005207420919280312
lease expired
2008-12-20 17:45:31,699 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 6223121878865388268
lease expired
2008-12-20 17:46:15,298 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7574816364936285182
lease expired
2008-12-20 17:46:32,666 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 908654114835311904
lease expired
2008-12-20 19:21:03,933 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Major compaction triggered on store: 170018656/content. Time since last major
compaction: 309999 seconds
...
{code}
> Deadlocked master; wants to assign root but can't because root is not assigned
> ------------------------------------------------------------------------------
>
> Key: HBASE-1080
> URL: https://issues.apache.org/jira/browse/HBASE-1080
> Project: Hadoop HBase
> Issue Type: Bug
> Environment: 17:05 < jgray> I0.19.0-dev, r726565
> 17:05 < jgray> 12/15/08 i grabbed it
> Reporter: stack
> Attachments: master.dump.log
>
>
> This lock assigning regions looks broad.
> {code}
> "IPC Server handler 6 on 60000" daemon prio=10 tid=0x00007ff2d00ab400
> nid=0x645b in Object.wait() [0x000000004330b000..0x000000004330cd70]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:695)
> - locked <0x00007ff2e8e2b3b0> (a
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
> at $Proxy2.batchUpdates(Unknown Source)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:916)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:914)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerForWithoutRetries(HConnectionManager.java:872)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:913)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1270)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1241)
> - locked <0x00007ff2e8c01b90> (a org.apache.hadoop.hbase.client.HTable)
> at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1221)
> - locked <0x00007ff2e8c01b90> (a org.apache.hadoop.hbase.client.HTable)
> at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:239)
> at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:218)
> at
> org.apache.hadoop.hbase.RegionHistorian.addRegionAssignment(RegionHistorian.java:142)
> at
> org.apache.hadoop.hbase.master.RegionManager.assignRegionsToMultipleServers(RegionManager.java:282)
> at
> org.apache.hadoop.hbase.master.RegionManager.assignRegions(RegionManager.java:220)
> - locked <0x00007ff2e895d3f8> (a
> java.util.Collections$SynchronizedSortedMap)
> at
> org.apache.hadoop.hbase.master.ServerManager.processMsgs(ServerManager.java:382)
> at
> org.apache.hadoop.hbase.master.ServerManager.processRegionServerAllsWell(ServerManager.java:324)
> at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:240)
> at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:570)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:892)
> ....
> {code}
> Its messing up assigning root it seems.
> We are stuck in here. Doesn't look like we'll break out though maybe we time
> out?
> {code}
> public Writable call(Writable param, InetSocketAddress addr,
> UserGroupInformation ticket)
> throws InterruptedException, IOException {
> Call call = new Call(param);
> Connection connection = getConnection(addr, ticket, call);
> connection.sendParam(call); // send the parameter
> synchronized (call) {
> while (!call.done) {
> try {
> call.wait(); // wait for the result
> } catch (InterruptedException ignored) {}
> }
> ...
> {code}
> ... down in HBaseClient.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.