[ 
https://issues.apache.org/jira/browse/HBASE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-4095:
--------------------------------

    Attachment: RelatedLogs2011-07-28.txt

After I've taken more tests about this issue. I found the cause of "RollWriter 
blocked for a long time" is just a illusion.
(In order to tracing what happened, I add some new logs, please check the 
attached file "RelatedLogs2011-07-28")
Here's the analysis:
(Suppose there's only 3 nodes in the cluster)

1. HLog-rolling was requested after one datanode was killed.
2. Set HLog#logRollRequested as true after the request.
3. HLog#rollWriter was executed some time later. 
4. A new rollWriter was requested nearby the prev one. And there's no new 
entries during the time.
   HLog#rollWriter() was returned at the beginning, for the numEntries.get() == 
0:
{noformat}
  public byte [][] rollWriter() throws FailedLogCloseException, IOException {
    // Return if nothing to flush.
    long stamp = System.currentTimeMillis();
    if (this.writer != null && this.numEntries.get() <= 0) {
      return null;
    }
    byte [][] regionsToFlush = null;
{noformat}

It happens with a high probabities if one datanode was down in such 
mini-cluster(3 nodes):

1.HLog#initialReplication is reset in each HLog#rollWriter.
{noformat}
  int nextInitialReplication = fs.getFileStatus(newPath).getReplication();
  this.initialReplication = nextInitialReplication;
{noformat}
The method of "fs.getFileStatus(newPath).getReplication()" could only get the 
default/expect replicas value(3).
So HLog#initialReplication is always 3.

2.HLog could only has 2 replicas for there's only 2 datanodes.
  So rollWriter was requested in each HLog#sync().
  
That's why it happens easily if one datanode was down.

I'm thinking about two solutions to this issue:

[Solution A]

Consider the actual live datanode count while reset the value of 
HLog#initialReplication.
If the live datanode count is less than the default replicas, its value should 
be the actual datanode count. Otherwise, set it to the default value.
But I've no idea about how to get the live datanode count. 

[Solution B]

Add a new configurable parameter to set the minimum tolerable number. It's 
default value is the default replicas.
If one datanode was killed which carried one replicas of HLog, but the actual 
replicas value is still larger than/equal with the configurable parameter 
value. We should not tricker a new RollWriter.
Something likes:
{noformat}
   private OutputStream hdfs_out;     // OutputStream associated with the 
current SequenceFile.writer
-  private int initialReplication;    // initial replication factor of 
SequenceFile.writer
+  
+  //Minimum tolerable replicas, if the actual value is less than that, the 
rollWriter would be trickered
+  private int minTolerableReplication; 
+  
   private Method getNumCurrentReplicas; // refers to 
DFSOutputStream.getNumCurrentReplicas
   final static Object [] NO_ARGS = new Object []{};
 
@@ -358,6 +361,9 @@
       }
     }
     this.maxLogs = conf.getInt("hbase.regionserver.maxlogs", 32);
+    
+    this.minTolerableReplication = 
+      conf.getInt("hbase.regionserver.hlog.mintolerablereplicas", 3);
     this.enabled = conf.getBoolean("hbase.regionserver.hlog.enabled", true);
     LOG.info("HLog configuration: blocksize=" +
       StringUtils.byteDesc(this.blocksize) +
@@ -480,7 +486,6 @@
       this.filenum = System.currentTimeMillis();
       Path newPath = computeFilename();
       HLog.Writer nextWriter = this.createWriterInstance(fs, newPath, conf);
-      int nextInitialReplication = fs.getFileStatus(newPath).getReplication();
       // Can we get at the dfsclient outputstream?  If an instance of
       // SFLW, it'll have done the necessary reflection to get at the
       // protected field name.
@@ -500,7 +505,6 @@
         // Clean up current writer.
         Path oldFile = cleanupCurrentWriter(currentFilenum);
         this.writer = nextWriter;
-        this.initialReplication = nextInitialReplication;
         this.hdfs_out = nextHdfsOut;
 
         LOG.info((oldFile != null?
@@ -1003,10 +1007,10 @@
     try {
       int numCurrentReplicas = getLogReplication();
       if (numCurrentReplicas != 0 &&
-          numCurrentReplicas < this.initialReplication) {
+          numCurrentReplicas < this.minTolerableReplication) {
         LOG.warn("HDFS pipeline error detected. " +
-            "Found " + numCurrentReplicas + " replicas but expecting " +
-            this.initialReplication + " replicas. " +
+            "Found " + numCurrentReplicas + " replicas but expecting no less 
than " +
+            this.minTolerableReplication + " replicas. " +
             " Requesting close of hlog.");
         requestLogRoll();
         logRollRequested = true;
{noformat}


> Hlog may not be rolled in a long time if checkLowReplication's request of 
> LogRoll is blocked
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4095
>                 URL: https://issues.apache.org/jira/browse/HBASE-4095
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>         Attachments: HBASE-4095-90-v2.patch, HBASE-4095-90.patch, 
> HBASE-4095-trunk-v2.patch, HBASE-4095-trunk.patch, HlogFileIsVeryLarge.gif, 
> RelatedLogs2011-07-28.txt, hbase-root-regionserver-193-195-5-111.rar, 
> surefire-report-branch.html
>
>
> Some large Hlog files(Larger than 10G) appeared in our environment, and I got 
> the reason why they got so huge:
> 1. The replicas is less than the expect number. So the method of 
> checkLowReplication will be called each sync.
> 2. The method checkLowReplication request log-roll first, and set 
> logRollRequested as true: 
> {noformat}
> private void checkLowReplication() {
> // if the number of replicas in HDFS has fallen below the initial
> // value, then roll logs.
> try {
>   int numCurrentReplicas = getLogReplication();
>   if (numCurrentReplicas != 0 &&
>         numCurrentReplicas < this.initialReplication) {
>       LOG.warn("HDFS pipeline error detected. " +
>               "Found " + numCurrentReplicas + " replicas but expecting " +
>               this.initialReplication + " replicas. " +
>               " Requesting close of hlog.");
>       requestLogRoll();
>       logRollRequested = true;
>   }
> } catch (Exception e) {
>   LOG.warn("Unable to invoke DFSOutputStream.getNumCurrentReplicas" + e +
>         " still proceeding ahead...");
> }
> }
> {noformat}
> 3.requestLogRoll() just commit the roll request. It may not execute in time, 
> for it must got the un-fair lock of cacheFlushLock.
> But the lock may be carried by the cacheflush threads.
> 4.logRollRequested was true until the log-roll executed. So during the time, 
> each request of log-roll in sync() was skipped.
> Here's the logs while the problem happened(Please notice the file size of 
> hlog "193-195-5-111%3A20020.1309937386639" in the last row):
> 2011-07-06 15:28:59,284 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.  
> Requesting close of hlog.
> 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Roll 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
>  entries=32434, filesize=239589754. New hlog 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
> 2011-07-06 15:29:56,929 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas.  
> Requesting close of hlog.
> 2011-07-06 15:29:56,933 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming flushed file at 
> hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/.tmp/4656903854447026847
>  to 
> hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983
> 2011-07-06 15:29:57,391 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Added 
> hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983,
>  entries=445880, sequenceid=248900, memsize=207.5m, filesize=130.1m
> 2011-07-06 15:29:57,478 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished memstore flush of ~207.5m for region 
> Htable_UFDR_034,07664,1309936974158.a3780cf0c909d8cf8f8ed618b290cc95. in 
> 10839ms, sequenceid=248900, compaction requested=false
> 2011-07-06 15:28:59,236 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Roll 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309926531955,
>  entries=216459, filesize=2370387468. New hlog 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119
> 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Roll 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119,
>  entries=32434, filesize=239589754. New hlog 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639
> 2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: 
> Hlog roll period 3600000ms elapsed
> 2011-07-06 16:29:58,775 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: 
> Hlog roll period 3600000ms elapsed
> 2011-07-06 16:30:01,978 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Roll 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639,
>  entries=1135576, filesize=19220372830. New hlog 
> /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309940998890

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to