optimization for log splitting (theory/suggestion)
--------------------------------------------------

                 Key: HBASE-3099
                 URL: https://issues.apache.org/jira/browse/HBASE-3099
             Project: HBase
          Issue Type: Bug
            Reporter: ryan rawson


Right now log splitting is slower than we'd like.  The slow pace of log 
splitting is one of the reasons why we have to keep a short, bounded, limit of 
the outstanding log files.  It would be nice to up that limit, to allow perhaps 
hundreds of logs.  It would increase efficiency because we would not be 
force-flushing regions at non-ideal sizes.

But more data means more to process.  Except that not all of the logs for a 
regionserver are actually useful.  This is because some regions got flushed 
before the oldest log was trimmed.  So during log recovery if we read the most 
recent sequenceid, we could skip, during log splitting (in the master), those 
entries and avoid writing them to the per-region log recovery.  It would reduce 
the IO by part, and if our serialization/deser code was clever we might be able 
to avoid deserializing much.  

It's not clear how effective or worthwhile this might be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to