[jira] Commented: (HBASE-3325) Optimize log splitter to not output obsolete edits

Jonathan Gray (JIRA) Thu, 09 Dec 2010 11:50:29 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969901#action_12969901
 ]


Jonathan Gray commented on HBASE-3325:
--------------------------------------

Do you think we'd need to go through the trouble of storing this mapping in ZK?

At least initially, we could just open the latest file for each region and grab 
the seqid out of it.  An optimization later could be to keep this data on the 
side.

If we do put it in ZK, shouldn't be an issue to do it synchronously, it only 
needs to be done after a flush and would be an insignificant overhead compared 
to typical flush times.

> Optimize log splitter to not output obsolete edits
> --------------------------------------------------
>
>                 Key: HBASE-3325
>                 URL: https://issues.apache.org/jira/browse/HBASE-3325
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>
> Currently when the master splits logs, it outputs all edits it finds, even 
> those that have already been obsoleted by flushes. At replay time on the RS 
> we discard the edits that have already been flushed.
> We could do a pretty simple optimization here - basically the RS should 
> replicate a map "region id -> last flushed seq id" into ZooKeeper (this can 
> be asynchronous by some seconds without any problems). Then when doing log 
> splitting, if we have this map available, we can discard any edits found in 
> the logs that were already flushed, and thus output a much smaller amount of 
> data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3325) Optimize log splitter to not output obsolete edits

Reply via email to