[
https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303902#comment-14303902
]
stack commented on HBASE-12958:
-------------------------------
Odd. We go from acknowledging that the host with meta is down (c2022) to of a
sudden forgetting about it:
{code}
2015-02-02 22:32:15,574 INFO [main-EventThread] zookeeper.RegionServerTracker:
RegionServer ephemeral node deleted, processing expiration
[c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.AssignmentManager:
based on AM, current region=hbase:meta,,1.1588230740 is on
server=c2022.halxg.cloudera.com,16020,1422944918568 server being checked:
c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.ServerManager:
Added=c2022.halxg.cloudera.com,16020,1422944918568 to dead servers, submitted
shutdown handler to be executed meta=true
2015-02-02 22:32:15,576 INFO [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
handler.MetaServerShutdownHandler: Splitting hbase:meta logs for
c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,577 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:15,585 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.MasterFileSystem: Renamed region directory:
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:15,585 INFO [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.SplitLogManager: dead splitlog workers
[c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,587 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.SplitLogManager: Scheduling batch of logs to split
2015-02-02 22:32:15,587 INFO [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.SplitLogManager: started splitting 1 logs in
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
for [c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,591 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: put up splitlog task at znode
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:15,591 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: task not yet acquired
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
ver = 0
2015-02-02 22:32:15,607 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
acquired by c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,929 INFO
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
coordination.SplitLogManagerCoordination: resubmitting task
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:15,941 INFO
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
master.SplitLogManager: resubmitted 1 out of 3 tasks
2015-02-02 22:32:15,941 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: task not yet acquired
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
ver = 3
2015-02-02 22:32:15,949 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task /hbase/splitWAL/RESCAN0000004442
entered state: DONE c2020.halxg.cloudera.com,16020,1422944946802
2015-02-02 22:32:15,957 DEBUG [main-EventThread]
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted
/hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:15,957 DEBUG [main-EventThread]
coordination.SplitLogManagerCoordination: deleted task without in memory state
/hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:16,007 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
acquired by c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:16,208 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
entered state: DONE c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:16,219 INFO [main-EventThread] wal.WALSplitter: Archived
processed log
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
to
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
2015-02-02 22:32:16,220 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: Done splitting
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:16,224 DEBUG [main-EventThread]
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:17,225 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
entered state: DONE c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:17,235 INFO [main-EventThread] wal.WALSplitter: Archived
processed log
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
to
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
2015-02-02 22:32:17,236 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: Done splitting
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,241 DEBUG [main-EventThread]
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,244 INFO [MASTER_SERVER_OPERATIONS-c2020:16020-2]
master.SplitLogManager: finished splitting (more than or equal to) 1955009811
bytes in 16 log files in
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting]
in 7355ms
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
master.RegionStates: Adding to processed servers
c2021.halxg.cloudera.com,16020,1422944889403
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
master.RegionStates: Offline splitting/merging region
{7e0c50a2296895f66bee85df0eaba533 state=SPLITTING, ts=1422945115260,
server=c2021.halxg.cloudera.com,16020,1422944889403}
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService,
sasl=false
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:17,554 INFO [MASTER_SERVER_OPERATIONS-c2020:16020-2]
client.RpcRetryingCaller: Call exception, tries=1, retries=350, started=305 ms
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService,
sasl=false
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Not trying to connect to
c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the failed servers
list
2015-02-02 22:32:18,056 INFO [MASTER_SERVER_OPERATIONS-c2020:16020-2]
client.RpcRetryingCaller: Call exception, tries=2, retries=350, started=807 ms
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:18,930 INFO
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
master.SplitLogManager: total tasks = 1 unassigned = 0
tasks={/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta=last_update
= 1422945135625 last_version = 2 cur_worker_name =
c2025.halxg.cloudera.com,16020,1422935795768 status = in_progress incarnation =
0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService,
sasl=false
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Not trying to connect to
c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the failed servers
list
2015-02-02 22:32:19,059 INFO [MASTER_SERVER_OPERATIONS-c2020:16020-2]
client.RpcRetryingCaller: Call exception, tries=3, retries=350, started=1810 ms
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:20,258 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: task
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
entered state: DONE c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:20,269 INFO [main-EventThread] wal.WALSplitter: Archived
processed log
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
to
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,269 INFO [main-EventThread]
coordination.SplitLogManagerCoordination: Done splitting
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,273 WARN [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.SplitLogManager: returning success without actually splitting and
deleting all the log files in path
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:20,273 INFO [MASTER_META_SERVER_OPERATIONS-c2020:16020-1]
master.SplitLogManager: finished splitting (more than or equal to) 83 bytes in
1 log files in
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
in 4686ms
2015-02-02 22:32:20,274 DEBUG [main-EventThread]
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:21,051 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2021.halxg.cloudera.com,16020,1422945139174
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker:
Added tracking of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService,
sasl=false
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2]
ipc.AbstractRpcClient: Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:21,072 INFO [MASTER_SERVER_OPERATIONS-c2020:16020-2]
client.RpcRetryingCaller: Call exception, tries=4, retries=350, started=3823 ms
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:21,111 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer:
RpcServer.listener,port=16020: connection from 10.20.84.27:35695; # active
connections: 6
{code}
[~jxiang] Any input boss? Why would we split the meta log and then not assign
hbase:meta? Thanks sir.
> SSH doing hbase:meta get but hbase:meta not assigned
> ----------------------------------------------------
>
> Key: HBASE-12958
> URL: https://issues.apache.org/jira/browse/HBASE-12958
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: stack
> Assignee: stack
>
> All master threads are blocked waiting on this call to return:
> {code}
> "MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0
> tid=0x00007f4b0408b000 nid=0x7821 in Object.wait() [0x00007f4ada24d000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
> - locked <0x000000041c374f50> (a
> java.util.concurrent.atomic.AtomicBoolean)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225)
> at
> org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634)
> - locked <0x000000041c1f0d80> (a
> org.apache.hadoop.hbase.master.RegionStates)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298)
> at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Master is stuck trying to find hbase:meta on the server that just crashed and
> that we just recovered:
> Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException:
> callTimeout=60000, callDuration=68181: row '' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740,
> hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
> Will add more detail in a sec.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)