[jira] [Commented] (HBASE-14383) Compaction improvements

Vladimir Rodionov (JIRA) Tue, 15 Sep 2015 17:23:58 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746564#comment-14746564
 ]


Vladimir Rodionov commented on HBASE-14383:
-------------------------------------------

Need a feedback on usefulness of *hbase.regionserver.maxlogs* configuration 
setting.

LogRoller runs periodically (1h by default) and does two things:

# Archive old logs (WAL files which have all WALEdits already flushed)
# Then checks number of active WAL files and if it exceeds 
hbase.regionserver.maxlogs then all regions which have edits from the oldest 
WAL file will be flushed. 

rollWriter from FSHLog:
{code}
  @Override
  public byte [][] rollWriter(boolean force) throws FailedLogCloseException, 
IOException {
    rollWriterLock.lock();
    try {
      // Return if nothing to flush.
      if (!force && (this.writer != null && this.numEntries.get() <= 0)) return 
null;
      byte [][] regionsToFlush = null;
      if (this.closed) {
        LOG.debug("WAL closed. Skipping rolling of writer");
        return regionsToFlush;
      }
      if (!closeBarrier.beginOp()) {
        LOG.debug("WAL closing. Skipping rolling of writer");
        return regionsToFlush;
      }
      TraceScope scope = Trace.startSpan("FSHLog.rollWriter");
      try {
        Path oldPath = getOldPath();
        Path newPath = getNewPath();
        // Any exception from here on is catastrophic, non-recoverable so we 
currently abort.
        Writer nextWriter = this.createWriterInstance(newPath);
        FSDataOutputStream nextHdfsOut = null;
        if (nextWriter instanceof ProtobufLogWriter) {
          nextHdfsOut = ((ProtobufLogWriter)nextWriter).getStream();
          // If a ProtobufLogWriter, go ahead and try and sync to force setup 
of pipeline.
          // If this fails, we just keep going.... it is an optimization, not 
the end of the world.
          preemptiveSync((ProtobufLogWriter)nextWriter);
        }
        tellListenersAboutPreLogRoll(oldPath, newPath);
        // NewPath could be equal to oldPath if replaceWriter fails.
        newPath = replaceWriter(oldPath, newPath, nextWriter, nextHdfsOut);
        tellListenersAboutPostLogRoll(oldPath, newPath);
        // Can we delete any of the old log files?
        if (getNumRolledLogFiles() > 0) {
          cleanOldLogs();
          regionsToFlush = findRegionsToForceFlush();
        }
      } finally {
        closeBarrier.endOp();
        assert scope == NullScope.INSTANCE || !scope.isDetached();
        scope.close();
      }
      return regionsToFlush;
    } finally {
      rollWriterLock.unlock();
    }
  }
{code}

There is a clear duplication in functionality between LogRoller (LR) and 
PeriodicMemstoreFlsuher (PMF). PMF already takes care of old memstores and 
flushes them - no need to call regionsToFlush = findRegionsToForceFlush() in a 
rollWriter call and hence there is no need in *hbase.regionserver.maxlogs* 
config option. PMF flushes periodically oldest memstores and LogRoller archives 
periodically old WAL files. That is it.   


> Compaction improvements
> -----------------------
>
>                 Key: HBASE-14383
>                 URL: https://issues.apache.org/jira/browse/HBASE-14383
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>
> Still major issue in many production environments. The general recommendation 
> - disabling region splitting and major compactions to reduce unpredictable 
> IO/CPU spikes, especially during peak times and running them manually during 
> off peak times. Still do not resolve the issues completely.
> h3. Flush storms
> * rolling WAL events across cluster can be highly correlated, hence flushing 
> memstores, hence triggering minor compactions, that can be promoted to major 
> ones. These events are highly correlated in time if there is a balanced 
> write-load on the regions in a table.
> *  the same is true for memstore flushing due to periodic memstore flusher 
> operation. 
> Both above may produce *flush storms* which are as bad as *compaction 
> storms*. 
> What can be done here. We can spread these events over time by randomizing 
> (with jitter) several  config options:
> # hbase.regionserver.optionalcacheflushinterval
> # hbase.regionserver.flush.per.changes
> # hbase.regionserver.maxlogs   
> h3. ExploringCompactionPolicy max compaction size
> One more optimization can be added to ExploringCompactionPolicy. To limit 
> size of a compaction there is a config parameter one could use 
> hbase.hstore.compaction.max.size. It would be nice to have two separate 
> limits: for peak and off peak hours.
> h3. ExploringCompactionPolicy selection evaluation algorithm
> Too simple? Selection with more files always wins, selection of smaller size 
> wins if number of files is the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14383) Compaction improvements

Reply via email to