Optimize log splitter to not output obsolete edits
--------------------------------------------------

                 Key: HBASE-3325
                 URL: https://issues.apache.org/jira/browse/HBASE-3325
             Project: HBase
          Issue Type: Improvement
          Components: master, regionserver
    Affects Versions: 0.92.0
            Reporter: Todd Lipcon


Currently when the master splits logs, it outputs all edits it finds, even 
those that have already been obsoleted by flushes. At replay time on the RS we 
discard the edits that have already been flushed.

We could do a pretty simple optimization here - basically the RS should 
replicate a map "region id -> last flushed seq id" into ZooKeeper (this can be 
asynchronous by some seconds without any problems). Then when doing log 
splitting, if we have this map available, we can discard any edits found in the 
logs that were already flushed, and thus output a much smaller amount of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to