Hi all, We found so many hlogs in our cluster, after some analysis, we also found one splitted region occurred in HLog.lastSeqWritten. For this region had been closed, it can't be flushed again. So blocking all the other logs removing to ".oldlogs" directory.
05:06:44,422 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=122, maxlogs=32; forcing flush of 1 regions(s): 2acaf8e3acfd2e8a5825a1f6f0aca4a8 05:06:44,422 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, requester=null 05:10:48,666 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=123, maxlogs=32; forcing flush of 1 regions(s): 2acaf8e3acfd2e8a5825a1f6f0aca4a8 05:10:48,666 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, requester=null 05:14:46,075 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=124, maxlogs=32; forcing flush of 1 regions(s): 2acaf8e3acfd2e8a5825a1f6f0aca4a8 05:14:46,075 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, requester=null 05:15:41,584 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=125, maxlogs=32; forcing flush of 1 regions(s): 2acaf8e3acfd2e8a5825a1f6f0aca4a8 05:15:41,584 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, requester=null Let's see what happened to the region of "2acaf8e3acfd2e8a5825a1f6f0aca4a8r": 00:30:49,242 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~129.5m for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 13299ms, sequenceid=20311822, compaction requested=true 00:30:49,242 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because User-triggered split; priority=1, compaction queue size=5840 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123 00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123, entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m 00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~133.5m for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 21816ms, sequenceid=20312223, compaction requested=true 00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because regionserver20020.cacheFlusher; priority=0, compaction queue size=5840 00:31:12,605 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. after 0sec 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.: disabling compactions & flushes 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:31:13,718 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:31:39,552 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in META ------ 00:31:42,529 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/52ff3c7c6df6e0337876bbca29cee84a/value/973789709483406123.2acaf8e3acfd2e8a5825a1f6f0aca4a8, isReference=true, isBulkLoadResult=false, seqid=20312224, majorCompaction=false 00:31:42,532 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/259d0c620c9105928e52713f4a5a252e/value/973789709483406123.2acaf8e3acfd2e8a5825a1f6f0aca4a8, isReference=true, isBulkLoadResult=false, seqid=20312223, majorCompaction=false 00:31:42,575 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META updated, and report to master. Parent=Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., new regions: Htable_UFDR_031,00332,1325809872607.259d0c620c9105928e52713f4a5a252e., Htable_UFDR_031,003732800093168-03594291912,1325809872607.52ff3c7c6df6e0337876bbca29cee84a.. Split took 29sec // At this time, found this region again in HLog#lastSeqWritten. 00:34:48,061 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove out of total 4; oldest outstanding sequenceid is 20312224 from region 2acaf8e3acfd2e8a5825a1f6f0aca4a8 Our HBase version is "0.90.5+" when we found this issue. So after splitting and closing, how this region come into HLog.lastSeqWritten? We analyze all the related code, it seems impossible. Because after a region be closed, it has no chance to add this region into HLog.lastSeqWritten again. So I want to get some insight here from someone who is intimately familiar with this or has encountered the similar problem. Thanks and Regards, Jieshan
