[
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784611#comment-15784611
]
Phil Yang commented on HBASE-9465:
----------------------------------
Thanks [~stack] for your comment.
bq. The latter keeps only the latest opening? Could it not have been amended to
keep all?
info:seqnumDuringOpen saves sequence id in its value so it can only saves one
value. But for replication we need all sequence id for each time a region being
opened. For compatibility I didn't want to change this column so I create a new
one. And a independent family(rep_barrier) can prevent read too much data when
we only want to read info family.
bq. Do we have the position in two places now? Do we need to update zk if we've
updated meta?
We save position for each WAL files in ZK(of course we have HBASE-15867 now).
For serial replication we save position for each region in meta. Two different
positions.
bq. Because? It is not continuous against a peer? Seqid is 'continuous' within
a region?
If I am not wrong, openSeq is the max sequence +1 , and the first log's
sequence after opening is openSeq+1, so in fact we will not have a log in WAL
whose seq is openSeq.
bq. Why the -1 in the above? Because we add 1 when we open a region?
Yes
bq. We need to write this assumption into the code around where splits bring up
daughters on same RS as parent. This policy could change (Y! have put up a
patch to make it so splits do not bring up daughters on same servers as parent
region).
Yes, the doc is old now. We have a special logic for split/merge in master
branch now.
bq. This is another assumption of the design that needs to be marked in code so
when this changes, we'll accommodate the fix here.
OK. And in fact I have a plan to improve this. We can use only one thread to
read the WAL for non-recovery sources to reduce I/O pressure and should have
some logic when one of peers being blocked, will file a issue when I am going
to do this.
bq. We do not write to the hbase:meta state of WALs, unless REPLICATION_SCOPE
is set to 2?
Yes
bq. Can you say more on this?
WAL is server-level and replication source is peer level. So if in a peer a
region's log can not be pushed because of serial replication. All logs for this
peer after this log are also blocked. To prevent this we have to split these
tables/cfs into different peers.
bq. When would an Entry not be ready to push?
When the region this entry belongs to has some logs whose seq is smaller than
this entry and they are not pushed to peer cluster, this entry can not be
pushed.
bq. Do we have an idea of how much extra load this fix puts on hbase:meta?
Puts to rep_barrier and rep_meta are in batch mutation when region opens, so I
think it is not a big extra load. rep_position is updated frequently whose QPS
is same as positions logging on ZK. They only happen when some families enable
this feature.
bq. How do we get insight on say delay that is happening because another RS's
thread is (we think) replaying a WAL?
As long as we think this log can not be pushed, normally something is delayed,
maybe failover or region moved. But we can not know the reason, we can only
wait the work done. Unless there is a bug finally they can be pushed.
These days I am working on this to enable this feature in our production
cluster, maybe there will be something need to improved. Hope I can say this
feature is stable when 1.4 release :)
> Push entries to peer clusters serially
> --------------------------------------
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Replication
> Affects Versions: 2.0.0, 1.4.0
> Reporter: Honghua Feng
> Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-9465-branch-1-v1.patch,
> HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch,
> HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch,
> HBASE-9465-branch-1-v4.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch,
> HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch,
> HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch,
> HBASE-9465-v7.patch, HBASE-9465-v7.patch, HBASE-9465.pdf
>
>
> When region-move or RS failure occurs in master cluster, the hlog entries
> that are not pushed before region-move or RS-failure will be pushed by
> original RS(for region move) or another RS which takes over the remained hlog
> of dead RS(for RS failure), and the new entries for the same region(s) will
> be pushed by the RS which now serves the region(s), but they push the hlog
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and
> major-compact occurs in peer cluster before put is pushed to peer cluster,
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the
> put is masked by the delete, hence data inconsistency between master and peer
> clusters
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)