[ 
https://issues.apache.org/jira/browse/HBASE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049269#comment-14049269
 ] 

stack commented on HBASE-11401:
-------------------------------

[~jeffreyz] Seems like its the wait on seqid and less disruptor batching (flame 
graphs raise a couple of new questions though....).

Was thinking like yourself whether pluggable memstore could work around Cells 
that have not yet gotten a sequenceid (Compressing memstore or encoding as a 
cellblock I'd suppose you'd compress a snapshot; if a Cell does not yet have a 
sequenceid, wait on the Cell sequenceid?)

Otherwise, we'll be back here to do some gymnastics (double ringbuffer, etc.).

> Late-binding sequenceid presumes a particular KeyValue mvcc format hampering 
> experiment
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-11401
>                 URL: https://issues.apache.org/jira/browse/HBASE-11401
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0
>            Reporter: Anoop Sam John
>            Priority: Critical
>             Fix For: 0.99.0
>
>         Attachments: 11401.changing.order.txt, memstore.txt, 
> nopatch.traces.svg, wpatch.traces.svg
>
>
> After HBASE-8763, we have combined KV mvcc and HLog seqNo. This is 
> implemented in a tricky way now.
> In HRegion on write path, we first write to memstore and then write to HLog 
> finally sync log. So at the time of write to memstore we dont know the WAL 
> seqNo.  To overcome this, we hold the ref to the KV objects just added to 
> memstore and pass those also to write to wal call. Once the seqNo is 
> obtained, we will reset the mvcc is those KVs with this seqNo.  (While write 
> to memstore we wrote kvs with a very high temp value for mvcc so that 
> concurrent readers wont see them)
> This model works well with the DefaultMemstore.  During the write there wont 
> be any concurrent call to snapshot(). 
> But now we have memstore as a pluggable interface. The above model of late 
> binding assumes that the memstore internal datastructure continue to refer to 
> same java objects. This might not be true always.  Like in HBASE-10713, in 
> btw the kvs can be converted into a CellBlock. If we discontinue to refer to 
> same KV java objects, we will fail in getting the seqNo assigned as kv mvcc.
> If we were doing write and sync to wal and then write to memstore, this would 
> have get solved. But this model we changed (in 94 I believe) for better perf. 
> Under HRegion level lock, we write to memstore and then to wal. Finally out 
> of lock we do the the log sync.  So we can not change it now
> I tried changing the order of ops within the lock (ie. write to log and then 
> to memstore) so that we can get the seqNo when write to memstore. But because 
> of the new HLog write model, we are not guarenteed to get the write to done 
> immediately. 
> One possible way can be add a new API in Log level, to get a next seqNo 
> alone. Call this first and then using which write to memstore and then to wal 
> (using this seqNo).  Just a random thought. Not tried.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to