[
https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919933#action_12919933
]
Kannan Muthukkaruppan commented on HBASE-3099:
----------------------------------------------
+1
> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>
> Key: HBASE-3099
> URL: https://issues.apache.org/jira/browse/HBASE-3099
> Project: HBase
> Issue Type: Bug
> Reporter: ryan rawson
>
> Right now log splitting is slower than we'd like. The slow pace of log
> splitting is one of the reasons why we have to keep a short, bounded, limit
> of the outstanding log files. It would be nice to up that limit, to allow
> perhaps hundreds of logs. It would increase efficiency because we would not
> be force-flushing regions at non-ideal sizes.
> But more data means more to process. Except that not all of the logs for a
> regionserver are actually useful. This is because some regions got flushed
> before the oldest log was trimmed. So during log recovery if we read the
> most recent sequenceid, we could skip, during log splitting (in the master),
> those entries and avoid writing them to the per-region log recovery. It
> would reduce the IO by part, and if our serialization/deser code was clever
> we might be able to avoid deserializing much.
> It's not clear how effective or worthwhile this might be.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.