[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783447#comment-15783447
 ] 

stack commented on HBASE-9465:
------------------------------

[~yangzhe1991] You see above by [~vincentpoon]?

The design doc is great. I had trouble on this section: "rep_barrier:{seqid}, 
in each time a RS opens a region, it saves the max sequence id in this region. 
info:seqnumDuringOpen also save a seqid and it only save the latest one, but we 
need all." Did we already have a info:seqnumDuringOpen ? How does it relate to  
rep_barrier:{seqid}?  The latter keeps only the latest opening? Could it not 
have been amended to keep all?

bq.  This record is saved after we ship the logs to peer and before we update 
the log position in ZK. 

Do we have the position in two places now? Do we need to update zk if we've 
updated meta?

bq. However, sequence id is not continuous...

Because?  It is not continuous against a peer? Seqid is 'continuous' within a 
region?

bq. ...should not be pushed to this peer until rep_position:{peerid}>=barrier_b 
­- 1.

Why the  -1 in the above? Because we add 1 when we open a region?

bq. There are special cases: region spliting and merging. However, three 
related regions must be in the same region server, so the order of pushing logs 
from parent to daughter can be guaranteed.

We need to write this assumption into the code around where splits bring up 
daughters on same RS as parent. This policy could change (Y! have put up a 
patch to make it so splits do not bring up daughters on same servers as parent 
region).

bq. ...Because we have only one thread for each peer in a RS

This is another assumption of the design that needs to be marked in code so 
when this changes, we'll accommodate the fix here.

bq. Set REPLICATION_SCOPE=2....

Not your fault but I was trying to find doc on what REPLICATION_SCOPE meant and 
there seems little. There is some in HConstants. REPLICATION_SCOPE seems to be 
0 for no replication, 1 for replicating to all peers and 2 if we want this 
feature which is all peers but with attempt at guaranteeing sequential play.

We do not write to the hbase:meta state of WALs, unless REPLICATION_SCOPE is 
set to 2?

bq. Because we have only one thread for each peer in a RS, as long as a cf’s 
REPLICATION_SCOPE is 2, all regions’ logs may be delayed but the order is not 
garenteed.

Can you say more on this?

bq. ....so if an Entry is not ready to push, all logs after it will be blocked.

When would an Entry not be ready to push?

Do we have an idea of how much extra load this fix puts on hbase:meta?

How do we get insight on say delay that is happening because another RS's 
thread is (we think) replaying a WAL?

Thanks for this fixup [~yangzhe1991] Its great.


> Push entries to peer clusters serially
> --------------------------------------
>
>                 Key: HBASE-9465
>                 URL: https://issues.apache.org/jira/browse/HBASE-9465
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Replication
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Honghua Feng
>            Assignee: Phil Yang
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-9465-branch-1-v1.patch, 
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, 
> HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, 
> HBASE-9465-branch-1-v4.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, 
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, 
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch, 
> HBASE-9465-v7.patch, HBASE-9465-v7.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to