[ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678681#comment-13678681
 ] 

stack commented on HBASE-8701:
------------------------------

On the 'other approach', I like the incremental aspect -- being able to remove 
HLogs as we go.

A MultiStoreFile would have edits from one WAL file for all regions in the WAL? 
 Which region would it live in and how would it get cleaned up?  (When all 
references had been dropped?)  We'd have to write 'reference' files into each 
region that pointed back to a range on this WAL?  Wouldn't we be making near as 
many NN operations as for the case where we wrote out an hfile per region?

I think this multistorefile notion too complex.

We could keep hfiles per region in memory and not write them until we had too 
but then we lose the incremental benefit and we start to arrive at the 
Enis/Elliott scheme?

On another note, I was thinking we could enable distributed replay now as the 
default if we turned off bringing the region online for writes but I realize 
now that we cannot enable distributed replay until we fix this problem; so 
ignore my request in the parent issue that asks that we turn it on.

On allowing KVs with the same coordinates returning sometimes in insert order 
and post distributed replay, possibly returning in a different order (if we 
give up respecting seqid on WAL split), I am not sure we should; it would mean 
we could not overwrite an existing KV definitively to remove it from the db.  
Do we think this a big deal (rare yes, but perhaps a facility we need to 
retain?)?

I commented over in hbase-8709.  Chatting w/ LarsH this evening he raised 
having seqid in the KV as is suggested there.   This has come up in a few 
context's.  If it was seqid rather than mvcc it would be of use replaying 
(though this seqid is out in the WAL's WALEdit file so would be redundnant 
having it in kv).  Replaying though, memstore would have to consider the seqid' 
when sorting.  We would not collapse versions of same coordinates during replay 
in the memstore (as we currently do in memstore); we could only do the 
collapsing after all WALs had been replayed and we are flushing (This latter 
fact would mean the replay memstore would be different to our current memstore).


  


                
> distributedLogReplay need to apply wal edits in the receiving order of those 
> edits
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8701
>                 URL: https://issues.apache.org/jira/browse/HBASE-8701
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.95.2
>
>
> This issue happens in distributedLogReplay mode when recovering multiple puts 
> of the same key + version(timestamp). After replay, the value is 
> nondeterministic of the key
> h5. The original concern situation raised from [~eclark]:
> For all edits the rowkey is the same.
> There's a log with: [ A (ts = 0), B (ts = 0) ]
> Replay the first half of the log.
> A user puts in C (ts = 0)
> Memstore has to flush
> A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
> Replay the rest of the Log.
> Flush
> The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
> Put(key,t=T) in WAL2
> h5. Below is the option I'd like to use:
> a) During replay, we pass wal file name hash in each replay batch and 
> original wal sequence id of each edit to the receiving RS
> b) Once a wal is recovered, playing RS send a signal to the receiving RS so 
> the receiving RS can flush
> c) In receiving RS, different WAL file of a region sends edits to different 
> memstores.(We can visualize this in high level as sending changes to a new 
> region object with name(origin region name + wal name hash) and use the 
> original sequence Ids.) 
> d) writes from normal traffic(allow writes during recovery) are put in normal 
> memstores as of today and flush normally with new sequenceIds.
> h5. The other alternative options are listed below for references:
> Option one
> a) disallow writes during recovery
> b) during replay, we pass original wal sequence ids
> c) hold flush till all wals of a recovering region are replayed. Memstore 
> should hold because we only recover unflushed wal edits. For edits with same 
> key + version, whichever with larger sequence Id wins.
> Option two
> a) During replay, we pass original wal sequence ids
> b) for each wal edit, we store each edit's original sequence id along with 
> its key. 
> c) during scanning, we use the original sequence id if it's present otherwise 
> its store file sequence Id
> d) compaction can just leave put with max sequence id
> Please let me know if you have better ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to