[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Kannan Muthukkaruppan (JIRA) Tue, 16 Oct 2012 20:36:10 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477575#comment-13477575
 ]


Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

Ramakrishna,

Thanks for your email.

#1. It is not clear why we even write a META entry for flushes...

{code}
private WALEdit completeCacheFlushLogEdit() {
    KeyValue kv = new KeyValue(METAROW, METAFAMILY, null,
      System.currentTimeMillis(), COMPLETE_CACHE_FLUSH);
    WALEdit e = new WALEdit();
    e.add(kv);
    return e;
  }
{code}

The replayRecoveredEdits() logic skips over these entries anyway. And the only 
reference I see for this special entry in HLog is in unit tests.

#2. Yes, currently there is a lot of comments (related to lastSeqWritten) 
before the function HLog.java:startCacheFlush(), but the logic is not very 
clear to me. The changes were committed as part of HBASE-3845. I think we 
should be able to simplify that logic. I think I see some potential bugs there 
even it stands now-- will need to spend some more time looking at this, and 
will write down an update here.

But bottom line, I still don't see any good fundamental reason we need to hold 
this lock for the duration of the entire flush (even given the lastSeqWritten 
map logic).

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an 
> unnecessary bottleneck. With a single flusher thread, we are basically not 
> setup to take advantage of the aggregate throughput that multi-disk nodes 
> provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL 
> per region server. So this particular fix may not buy as much unless we 
> unlock that bottleneck with multiple commit logs per region server. (Topic 
> for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk 
> imports), we should be able to support much better ingest rates with parallel 
> flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores

Reply via email to