[
https://issues.apache.org/jira/browse/HBASE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679325#action_12679325
]
stack commented on HBASE-1169:
------------------------------
Should we move to 0.19.2 and 0.20.0 and out of 0.19.1?
> During shutdown, odd handling of regionserver hosting META
> ----------------------------------------------------------
>
> Key: HBASE-1169
> URL: https://issues.apache.org/jira/browse/HBASE-1169
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.19.0, 0.20.0
> Reporter: Jonathan Gray
> Priority: Critical
> Fix For: 0.19.1, 0.20.0
>
>
> During shutdown of cluster, half way through quiescing servers there is a
> META scan in the master. The regions from servers whose leases are already
> canceled show up as invalid. (72.34.249.208 is hosting META)
> {code}
> 2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster
> shutdown requested. Starting to quiesce servers
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager:
> Cancelling lease for 72.34.249.211:60020
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager:
> Cancelling lease for 72.34.249.216:60020
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.210:60020 quiesced
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager:
> Cancelling lease for 72.34.249.215:60020
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager:
> Cancelling lease for 72.34.249.213:60020
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.218:60020 quiesced
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager:
> Cancelling lease for 72.34.249.214:60020
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager:
> Region server 72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
> startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of activitydupehash,,1229364212541 is not valid; Server
> '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of api,,1229364235220 is not valid; Server
> '72.34.249.216:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of apps,,1229364222879 is not valid; Server
> '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of assigners,,1229364037757 is not valid; Server
> '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of canoncache,,1229364041955 is not valid; Server
> '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of chunks,,1229390225893 is not valid; Server
> '72.34.249.211:60020' unknown.
> {code}
> Shutdown then continues as the last servers are quiesced, but at the same
> time the Master expires the lease on the regionserver that was hosting META
> and that it just scanned. It then starts to replay the logs for that
> regionserver in the middle of the shutdown.
> {code}
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 512 row(s) of meta region {regionname:
> .META.,,1, startKey: <>, server: 72.34.249.218:60020}
> complete
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
> 2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
> startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 512 row(s) of meta region {regionname:
> .META.,,1, startKey: <>, server: 72.34.249.218:60020}
> complete
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
> 2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager:
> 72.34.249.218:60020 lease expired
> 2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: ProcessServerShutdown of 72.34.249.218:60020
> 2009-01-31 10:27:40,375 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server 72.34.249.218:60020: logSplit: false, rootRescanned: false,
> numberOfMetaRegions: 1, onlin
> eMetaRegions.size(): 1
> 2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 44 log(s) in
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
> 2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 1 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Creating new log file writer for path
> hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region .META.,,1
> 2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Creating new log file writer for path
> hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region
> sources,,1229364117966
> 2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Applied 100003 total edits from
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 2 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Creating new log file writer for path
> hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region
> dupehash,O��<L;h�,12
> 31779694744
> 2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager:
> 72.34.249.217:60020 lease expired
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager:
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling root scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling meta scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> meta and root scanners notified
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner:
> RegionManager.rootScanner exiting
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner:
> RegionManager.metaScanner exiting
> 2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Applied 100001 total edits from
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 3 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager:
> 72.34.249.210:60020 lease expired
> 2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager:
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling root scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling meta scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> meta and root scanners notified
> 2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager:
> 72.34.249.212:60020 lease expired
> 2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager:
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling root scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling meta scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> meta and root scanners notified
> 2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Applied 100002 total edits from
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 4 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Applied 100001 total edits from
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 5 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> 2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager:
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling root scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling meta scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> meta and root scanners notified
> 2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Applied 100001 total edits from
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> {code}
> During the log replay, a log file was missing from HDFS. Not sure why, there
> was a Datanode crash that could be related. More importantly, once it trips
> on the missing file it stops the replay (even though there's another 37 logs).
> {code}
> 2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 6 of 44:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
> 2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster:
> Processing pending operations: ProcessServerShutdown of 72.34.249.218:60020
> java.io.FileNotFoundException: File does not exist:
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
> at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
> at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling root scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> telling meta scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> meta and root scanners notified
> 2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian:
> Offlined
> 2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping
> infoServer
> 2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping
> Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
> 2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped
> SocketListener on 0.0.0.0:60010
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.