[jira] [Created] (HBASE-4095) Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll blocked

Jieshan Bean (JIRA) Thu, 14 Jul 2011 04:56:33 -0700

Hlog may not be rolled in a long time if checkLowReplication's request of 
LogRoll blocked
-----------------------------------------------------------------------------------------


                 Key: HBASE-4095
                 URL: https://issues.apache.org/jira/browse/HBASE-4095
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.90.3
            Reporter: Jieshan Bean


Some large Hlog files(Larger than 10G) appeared in our environment, and I got 
the reason why they got so huge:

1. The replicas is less than the expect number. So the method of 
checkLowReplication will be called each sync.

2. The method checkLowReplication request log-roll first, and set 
logRollRequested as true: 

{noformat}
private void checkLowReplication() {
// if the number of replicas in HDFS has fallen below the initial
// value, then roll logs.
try {
  int numCurrentReplicas = getLogReplication();
  if (numCurrentReplicas != 0 &&
          numCurrentReplicas < this.initialReplication) {
        LOG.warn("HDFS pipeline error detected. " +
                "Found " + numCurrentReplicas + " replicas but expecting " +
                this.initialReplication + " replicas. " +
                " Requesting close of hlog.");
        requestLogRoll();
        logRollRequested = true;
  }
} catch (Exception e) {
  LOG.warn("Unable to invoke DFSOutputStream.getNumCurrentReplicas" + e +
          " still proceeding ahead...");
}
}
{noformat}
3.requestLogRoll() just commit the roll request. It may not execute in time, 
for it must got the un-fair lock of cacheFlushLock.
But the lock may be carried by the cacheflush threads.

4.logRollRequested was true until the log-roll executed. So during the time, 
each request of log-roll in sync() was skipped.

Here's the logs while the problem happened(Please notice the file size of hlog 
"193-195-5-111%3A20020.1309937386639" in the last row):

2011-07-06 15:28:59,284 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.  
Requesting close of hlog.
2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
 entries=32434, filesize=239589754. New hlog 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
2011-07-06 15:29:56,929 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.  
Requesting close of hlog.
2011-07-06 15:29:56,933 INFO org.apache.hadoop.hbase.regionserver.Store: 
Renaming flushed file at 
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/.tmp/4656903854447026847
 to 
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983
2011-07-06 15:29:57,391 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983,
 entries=445880, sequenceid=248900, memsize=207.5m, filesize=130.1m
2011-07-06 15:29:57,478 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~207.5m for region 
Htable_UFDR_034,07664,1309936974158.a3780cf0c909d8cf8f8ed618b290cc95. in 
10839ms, sequenceid=248900, compaction requested=false
2011-07-06 15:28:59,236 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309926531955,
 entries=216459, filesize=2370387468. New hlog 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119
2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
 entries=32434, filesize=239589754. New hlog 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: 
Hlog roll period 3600000ms elapsed
2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: 
Hlog roll period 3600000ms elapsed
2011-07-06 16:30:01,978 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639,
 entries=1135576, filesize=19220372830. New hlog 
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309940998890



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4095) Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll blocked

Reply via email to