Hlog may not be rolled in a long time if checkLowReplication's request of
LogRoll blocked
-----------------------------------------------------------------------------------------
Key: HBASE-4095
URL: https://issues.apache.org/jira/browse/HBASE-4095
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.90.3
Reporter: Jieshan Bean
Some large Hlog files(Larger than 10G) appeared in our environment, and I got
the reason why they got so huge:
1. The replicas is less than the expect number. So the method of
checkLowReplication will be called each sync.
2. The method checkLowReplication request log-roll first, and set
logRollRequested as true:
{noformat}
private void checkLowReplication() {
// if the number of replicas in HDFS has fallen below the initial
// value, then roll logs.
try {
int numCurrentReplicas = getLogReplication();
if (numCurrentReplicas != 0 &&
numCurrentReplicas < this.initialReplication) {
LOG.warn("HDFS pipeline error detected. " +
"Found " + numCurrentReplicas + " replicas but expecting " +
this.initialReplication + " replicas. " +
" Requesting close of hlog.");
requestLogRoll();
logRollRequested = true;
}
} catch (Exception e) {
LOG.warn("Unable to invoke DFSOutputStream.getNumCurrentReplicas" + e +
" still proceeding ahead...");
}
}
{noformat}
3.requestLogRoll() just commit the roll request. It may not execute in time,
for it must got the un-fair lock of cacheFlushLock.
But the lock may be carried by the cacheflush threads.
4.logRollRequested was true until the log-roll executed. So during the time,
each request of log-roll in sync() was skipped.
Here's the logs while the problem happened(Please notice the file size of hlog
"193-195-5-111%3A20020.1309937386639" in the last row):
2011-07-06 15:28:59,284 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.
Requesting close of hlog.
2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
entries=32434, filesize=239589754. New hlog
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
2011-07-06 15:29:56,929 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.
Requesting close of hlog.
2011-07-06 15:29:56,933 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/.tmp/4656903854447026847
to
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983
2011-07-06 15:29:57,391 INFO org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983,
entries=445880, sequenceid=248900, memsize=207.5m, filesize=130.1m
2011-07-06 15:29:57,478 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~207.5m for region
Htable_UFDR_034,07664,1309936974158.a3780cf0c909d8cf8f8ed618b290cc95. in
10839ms, sequenceid=248900, compaction requested=false
2011-07-06 15:28:59,236 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309926531955,
entries=216459, filesize=2370387468. New hlog
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119
2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
entries=32434, filesize=239589754. New hlog
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller:
Hlog roll period 3600000ms elapsed
2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller:
Hlog roll period 3600000ms elapsed
2011-07-06 16:30:01,978 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639,
entries=1135576, filesize=19220372830. New hlog
/hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309940998890
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira