[ 
https://issues.apache.org/jira/browse/HBASE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-1169:
---------------------------------

    Fix Version/s:     (was: 0.19.1)
                   0.19.2

+1

> During shutdown, odd handling of regionserver hosting META
> ----------------------------------------------------------
>
>                 Key: HBASE-1169
>                 URL: https://issues.apache.org/jira/browse/HBASE-1169
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0, 0.20.0
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.19.2, 0.20.0
>
>
> During shutdown of cluster, half way through quiescing servers there is a 
> META scan in the master.  The regions from servers whose leases are already 
> canceled show up as invalid.  (72.34.249.208 is hosting META)
> {code}
> 2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster 
> shutdown requested. Starting to quiesce servers
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cancelling lease for 72.34.249.211:60020
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cancelling lease for 72.34.249.216:60020
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.210:60020 quiesced
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cancelling lease for 72.34.249.215:60020
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cancelling lease for 72.34.249.213:60020
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.218:60020 quiesced
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cancelling lease for 72.34.249.214:60020
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Region server 72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: 
> RegionManager.metaScanner scanning meta region {regionname: .META.,,1, 
> startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of activitydupehash,,1229364212541 is not valid;  Server 
> '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of api,,1229364235220 is not valid;  Server 
> '72.34.249.216:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of apps,,1229364222879 is not valid;  Server 
> '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of assigners,,1229364037757 is not valid;  Server 
> '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of canoncache,,1229364041955 is not valid;  Server 
> '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
> Current assignment of chunks,,1229390225893 is not valid;  Server 
> '72.34.249.211:60020' unknown.
> {code}
> Shutdown then continues as the last servers are quiesced, but at the same 
> time the Master expires the lease on the regionserver that was hosting META 
> and that it just scanned.  It then starts to replay the logs for that 
> regionserver in the middle of the shutdown.
> {code}
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: 
> RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: 
> .META.,,1, startKey: <>, server: 72.34.249.218:60020}
>  complete
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All 
> 1 .META. region(s) scanned
> 2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: 
> RegionManager.metaScanner scanning meta region {regionname: .META.,,1, 
> startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: 
> RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: 
> .META.,,1, startKey: <>, server: 72.34.249.218:60020}
>  complete
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All 
> 1 .META. region(s) scanned
> 2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: 
> 72.34.249.218:60020 lease expired
> 2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Processing todo: ProcessServerShutdown of 72.34.249.218:60020
> 2009-01-31 10:27:40,375 INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of 
> server 72.34.249.218:60020: logSplit: false, rootRescanned: false, 
> numberOfMetaRegions: 1, onlin
> eMetaRegions.size(): 1
> 2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 44 log(s) in 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
> 2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 1 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Creating new log file writer for path 
> hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region .META.,,1
> 2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Creating new log file writer for path 
> hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region 
> sources,,1229364117966
> 2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Applied 100003 total edits from 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 2 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Creating new log file writer for path 
> hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region 
> dupehash,O��<L;h�,12
> 31779694744
> 2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager: 
> 72.34.249.217:60020 lease expired
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager: 
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling root scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling meta scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> meta and root scanners notified
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: 
> RegionManager.rootScanner exiting
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: 
> RegionManager.metaScanner exiting
> 2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Applied 100001 total edits from 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 3 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager: 
> 72.34.249.210:60020 lease expired
> 2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager: 
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling root scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling meta scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> meta and root scanners notified
> 2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager: 
> 72.34.249.212:60020 lease expired
> 2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager: 
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling root scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling meta scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> meta and root scanners notified
> 2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Applied 100002 total edits from 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 4 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Applied 100001 total edits from 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 5 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> 2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager: 
> All user tables quiesced. Proceeding with shutdown
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling root scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling meta scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> meta and root scanners notified
> 2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Applied 100001 total edits from 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> {code}
> During the log replay, a log file was missing from HDFS.  Not sure why, there 
> was a Datanode crash that could be related.  More importantly, once it trips 
> on the missing file it stops the replay (even though there's another 37 logs).
> {code}
> 2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 6 of 44: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
> 2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: 
> Processing pending operations: ProcessServerShutdown of 72.34.249.218:60020
> java.io.FileNotFoundException: File does not exist: 
> hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
>         at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>         at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
>         at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
>         at 
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
>         at 
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling root scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> telling meta scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: 
> meta and root scanners notified
> 2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: 
> Offlined
> 2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping 
> infoServer
> 2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping 
> Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
> 2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped 
> SocketListener on 0.0.0.0:60010
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to