[
https://issues.apache.org/jira/browse/HBASE-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887938#action_12887938
]
stack commented on HBASE-2808:
------------------------------
Great doc. Here are some comments:
{code}
Awkward phrasing -> ".. and can contribute to enable high availability"
Not really sure what you are trying to say so no suggested alternative
Is this master cluster or hbase master -> "replication is master-push;..."
I don't get this bit ->
"it is much easier to keep track of what's currently being replicated since
+ each region server has its own write-ahead-log (aka WAL or HLog),
compared
+ to other well known solutions like MySQL master/slave replication where
+ there's only one bin log to keep track of.
"
Whats easier? I'd think mysql is easier?
Leave out the 'that' in the following: "and that rows inserted"
Can you cite a link for this -> "MySQL's statement-based replication" that
expliains mysql statement-based replication?
Something missing here.... "Put and Delete) are replicated in order maintain
atomicity."
Say who does the transform... "The key values are transformed into a WALEdit
which is.."
Say more what this means -> " (that is, that are part + of a family
scoped GLOBAL and non-catalog).
"
Reassure reader that this is being done for them by the server, they have to do
nothing but config.
Synchronously, the region server that receives the edits reads them
+ sequentially and applies them on its local cluster using a pool of
+ HTables. If consecutive rows belong to the same table, they are
+ inserted together in order to leverage parallel insertions.
Explain that RS is running a client and may be inserting across its cluster
when you say this -> "
Logs that are archived will update their paths in the
+ in-memory queue of the replicating thread.
"I tihnk you need to explain archiving.... else confusion -> "
What is a "cluster key"?
What is "available sinks"?
For example, if a slave
+ cluster has 150 machines, 15 will be chosen as potential sinks for
this
+ master RS.
You mean 'hosts for clients running in the slave cluster' when you say
following? "
In above I think you have to say master cluster RS rather than just 'master RS'
because master usually refers to something else in our parlance.
Since this is done by all master RSs, the probability that
+ all slave RSs are used is very high, and this method works for
clusters
+ of any size.
I don't follow .... is this for case where many master clusters replicating
into a single slave? "
SHould be 'these' instead of 'those' in following "and each of those contain "
You could expand... saying that if multiple slave clusters, then we'll have a
znode per when you say "znode per peer cluster"
Each of those queues will track the HLogs created
+ by that RS, but they can differ in size. For example, if one slave
+ cluster becomes unavailable for some time then the HLogs cannot be,
+ thus they need to stay in the queue (while the others are processed).
"This needs fixup... hard to figure what is being said: "
Whats this -> "slave cluster's znode just before it's made available."
Does this mean we lose edits? "
The queue items are deleted when the replication thread cannot read
+ more entries from a file and that there are other files in the queue.
"
Whats this mean? "
or because there's
+ too many of them)
When would we archive because too many?
"it will notify the source threads that the path
+ for that log changed.
Why does it have to notify that log dir has changed? If not in one location,
can't we check archive area w/o requiring notification?
"
GLOBAL means replicate? Any provision to replicate only to cluster X and not
to cluster Y? or is that for later?
Explain catalog table
You need a bulk edit shipper? Something that allows you transfer 64MB of edits
in one go?
Is it a mistake that WALEdit doesn't carry Put and Delete objects, that we have
to reinstantiate not only replicating but when replaying edits? Should we make
an issue to fix?
Say what these could be?
Note that if the master and slave cluster don't have the same
+ time, time-related issues may occur.
"
Why? ain't the ts in the edit?
{code}
> Document the implementation of replication
> ------------------------------------------
>
> Key: HBASE-2808
> URL: https://issues.apache.org/jira/browse/HBASE-2808
> Project: HBase
> Issue Type: Task
> Components: documentation
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.90.0
>
> Attachments: HBASE-2808-v2.patch, HBASE-2808.patch,
> replication_overview.png
>
>
> From HBASE-2223, we need to provide an overview of how replication was
> implemented. For example:
> - How ZK is used
> - What are the general flows
> - How failover works
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.