[ 
https://issues.apache.org/jira/browse/HBASE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969937#action_12969937
 ] 

Jonathan Gray commented on HBASE-3325:
--------------------------------------

@Kannan, good point.  With that and fact that we only need to update ZK on a 
flush, should be fine to keep this data in ZK, though this would be first time 
we retain per-region info in ZK so may need to start thinking about clusters 
with 1000s of regions.

> Optimize log splitter to not output obsolete edits
> --------------------------------------------------
>
>                 Key: HBASE-3325
>                 URL: https://issues.apache.org/jira/browse/HBASE-3325
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>
> Currently when the master splits logs, it outputs all edits it finds, even 
> those that have already been obsoleted by flushes. At replay time on the RS 
> we discard the edits that have already been flushed.
> We could do a pretty simple optimization here - basically the RS should 
> replicate a map "region id -> last flushed seq id" into ZooKeeper (this can 
> be asynchronous by some seconds without any problems). Then when doing log 
> splitting, if we have this map available, we can discard any edits found in 
> the logs that were already flushed, and thus output a much smaller amount of 
> data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to