[jira] Commented: (HBASE-3099) optimization for log splitting (theory/suggestion)

Kannan Muthukkaruppan (JIRA) Mon, 11 Oct 2010 11:23:57 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919933#action_12919933
 ]


Kannan Muthukkaruppan commented on HBASE-3099:
----------------------------------------------

+1

> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>
>                 Key: HBASE-3099
>                 URL: https://issues.apache.org/jira/browse/HBASE-3099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>
> Right now log splitting is slower than we'd like.  The slow pace of log 
> splitting is one of the reasons why we have to keep a short, bounded, limit 
> of the outstanding log files.  It would be nice to up that limit, to allow 
> perhaps hundreds of logs.  It would increase efficiency because we would not 
> be force-flushing regions at non-ideal sizes.
> But more data means more to process.  Except that not all of the logs for a 
> regionserver are actually useful.  This is because some regions got flushed 
> before the oldest log was trimmed.  So during log recovery if we read the 
> most recent sequenceid, we could skip, during log splitting (in the master), 
> those entries and avoid writing them to the per-region log recovery.  It 
> would reduce the IO by part, and if our serialization/deser code was clever 
> we might be able to avoid deserializing much.  
> It's not clear how effective or worthwhile this might be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3099) optimization for log splitting (theory/suggestion)

Reply via email to