The below is hard to read but I think I follow you Gao. Makes sense. Do you have a patch? (How did you manufacture the below? Was it by accident? Its a good one).
St.Ack On Fri, May 27, 2011 at 7:04 PM, Gaojinchao <[email protected]> wrote: > issue: > NullPointerException while hmaster starting. > java.lang.NullPointerException > at java.util.TreeMap.getEntry(TreeMap.java:324) > at java.util.TreeMap.get(TreeMap.java:255) > at > org.apache.hadoop.hbase.master.AssignmentManager.addToServers(AssignmentManager.java:1512) > at > org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:606) > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:214) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:402) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > > // below is my analysis > private void finishInitialization() > throws IOException, InterruptedException, KeeperException > { > > /* > In the function, waitForRegionServers return when some regionservers > checkin. > The regionservers include some regions, but not root-region and > meta-region. > (the Registered servers are 157-5-111-13, 157-5-111-14, 157-5-111-12) > */ > int regionCount = his.serverManager.waitForRegionServers(); > > /* > In the function, we can know > verifyRootRegionLocation/verifyMetaRegionLocation are succeed with the log of > assigned=0, > and the root-regionserver and meta-regionserver are running. > (The 157-5-111-11 server is running, but not Registered. and server > include root-region and meta-region.) > */ > assignRootAndMeta(); > > if (regionCount == 0) { > LOG.info("Master startup proceeding: cluster startup"); > this.assignmentManager.cleanoutUnassigned(); > this.assignmentManager.assignAllUserRegions(); > } else { > LOG.info("Master startup proceeding: master failover"); > > /* > HServerInfo hsi = > this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); > In processFailover function, call getHServerInfo return NULL pointer, > hsi is NULL pointer. > Because the regionserver with meta-region has not Registered. > */ > this.assignmentManager.processFailover(); > } > } > > the exception logs: > > 2011-05-21 14:44:47,973 INFO org.apache.hadoop.hbase.master.ServerManager: > Waiting on regionserver(s) to checkin > 2011-05-21 14:44:49,473 INFO org.apache.hadoop.hbase.master.ServerManager: > Waiting on regionserver(s) to checkin > 2011-05-21 14:44:50,974 INFO org.apache.hadoop.hbase.master.ServerManager: > Waiting on regionserver(s) to checkin > 2011-05-21 14:44:51,281 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=157-5-111-13,20020,1305877624933, regionCount=1543, > userLoad=true > 2011-05-21 14:44:51,722 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=157-5-111-14,20020,1305877627727, regionCount=1507, > userLoad=true > 2011-05-21 14:44:51,774 INFO org.apache.hadoop.hbase.master.ServerManager: > Registering server=157-5-111-12,20020,1305877626108, regionCount=1521, > userLoad=true > 2011-05-21 14:44:52,474 INFO org.apache.hadoop.hbase.master.ServerManager: > Waiting on regionserver(s) count to settle; currently=3 > 2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager: > Finished waiting for regionserver count to settle; count=3, sleptFor=18000 > 2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager: > Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of > regions out on cluster=4571 > 2011-05-21 14:44:53,978 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > Log folder > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161 doesn't > belong to a known region server, splitting > 2011-05-21 14:44:53,991 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 2 hlog(s) in > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161 > 2011-05-21 14:44:53,992 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-0,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-2,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-3,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-1,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-6,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-5,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-4,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-8,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-7,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-9,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-11,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-10,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-12,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-13,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-14,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-15,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 2: > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857, > length=1503902 > 2011-05-21 14:44:53,996 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering > file > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:54,998 INFO org.apache.hadoop.hbase.util.FSUtils: Finished > lease recover attempt for > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:55,140 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs > -- HDFS-200 > 2011-05-21 14:44:55,141 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer > path=hdfs://157.5.111.10:9000/hbase/-ROOT-/70236052/recovered.edits/0000000000000021721 > region=70236052 > 2011-05-21 14:44:55,177 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs > -- HDFS-200 > 2011-05-21 14:44:55,177 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer > path=hdfs://157.5.111.10:9000/hbase/.META./1028785192/recovered.edits/0000000000000021722 > region=1028785192 > 2011-05-21 14:44:55,435 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Pushed=5400 entries > from > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:55,436 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 2 of 2: > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259, > length=0 > 2011-05-21 14:44:55,436 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering > file > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,439 INFO org.apache.hadoop.hbase.util.FSUtils: Finished > lease recover attempt for > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,439 WARN > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > might be still open, length is 0 > 2011-05-21 14:44:56,451 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Pushed=0 entries from > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,489 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > to hdfs://157.5.111.10:9000/hbase/.oldlogs/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:56,506 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > to hdfs://157.5.111.10:9000/hbase/.oldlogs/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,523 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split > writer threads to finish > 2011-05-21 14:44:57,438 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers finished > 2011-05-21 14:44:57,455 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path > hdfs://157.5.111.10:9000/hbase/.META./1028785192/recovered.edits/0000000000000021722 > (wrote 5399 edits in 322ms) > 2011-05-21 14:44:57,505 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path > hdfs://157.5.111.10:9000/hbase/-ROOT-/70236052/recovered.edits/0000000000000021721 > (wrote 1 edits in 106ms) > 2011-05-21 14:44:57,505 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting > completed in 3517 ms for > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161 > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > Log folder > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-12,20020,1305877626108 belongs > to an existing region server > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > Log folder > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-13,20020,1305877624933 belongs > to an existing region server > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > Log folder > hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-14,20020,1305877627727 belongs > to an existing region server > 2011-05-21 14:44:57,524 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- > assigned=0, rit=false, location=157-5-111-11:20020 > 2011-05-21 14:44:57,528 INFO org.apache.hadoop.hbase.master.HMaster: .META. > assigned=0, rit=false, location=157-5-111-11:20020 > 2011-05-21 14:44:57,528 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-05-21 14:44:57,530 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.NullPointerException > at java.util.TreeMap.getEntry(TreeMap.java:324) > at java.util.TreeMap.get(TreeMap.java:255) > at > org.apache.hadoop.hbase.master.AssignmentManager.addToServers(AssignmentManager.java:1512) > at > org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:606) > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:214) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:402) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > 2011-05-21 14:44:57,531 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2011-05-21 14:44:57,531 DEBUG org.apache.hadoop.hbase.master.HMaster: > Stopping service threads >
