[
https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875452#comment-13875452
]
Devaraj Das commented on HBASE-10070:
-------------------------------------
bq. Ok on the timing. You know how I feel about 1.0 – sooner rather than later
– but hopefully this feature gets done in time.
Yeah.. couple of us are on it.
bq. After thinking more on this, I 'get' why you have the replicas listed
inside in the row rather than as rows themselves [in hbase:meta]. The row in
hbase:meta becomes a proxy or facade for the little cluster of regions one of
which is the primary with the others read replicas.
That's great. A copy-paste of what I said in the RB on HBASE-10347 for others'
reference.
"I and Enis had debated this as well. The consensus between us was that we
don't need to add new META rows for the replicas. After all, the HRI
information is exactly the same for all the replicas except for the replicaID.
In the current meta, we already have a column for the location of a region. It
seemed logical to just extend that model - add newer columns for the replica
locations (and similarly for the other columns like seqnum). That way
everything for a particular user-visible region stays in one row (and makes it
easier for readers to know about all replica locations from that one row).
Regarding special casing, yes there is some special casing in the way the
regions are added to the meta - create table will create all regions (if the
table was created with replica > 1), but only the primary regions will be added
to the meta. The regionserver - when it updates the meta with the location
after it opens a region invokes the API passing the replicaID as an argument -
the column names are different based on whether the replicaID is primary or
not. These are pretty much the special cases for the meta updates."
bq. HRegionInfo now is overloaded. Before it was the info on a specific region.
Now it is trying to serve two purposes; its original intent and now too as a
descriptor on the region-serving 'cluster' made of a primary and replicas. Lets
avoid overloading what up to this has had a clear role in the hbase model.
By doing it the way we have in the patch on HBASE-10347, it seems to reflect
what's going on - "HRI is a logical descriptor and a facade for a bunch of
primary & replicas". That's how we store things in the meta and how we
reconstruct HRIs from the meta when needed.
There are possibly other approaches of doing this. E.g. Extend HRegionInfo as,
say, HRegionInfoReplica and maintain the information about replicaID there,
and/or change all the relevant methods to accept HRegionInfoReplica and
potentially return this as well in relevant situations. The issue there is
those approaches would be very intrusive and we would still need special cases
for replicaID == 0 or not. Not confident how much we would gain there. Is it
too much to ask to change the view of what a HRI means (to what you say above).
Anyway, let me ponder a bit on this...
bq. The primary holds the 'pole position' being the name of the region in meta.
The read replicas are differently named with the 00001 and 00002, etc.,
interpolated into the middle of the region name. I suppose doing it this way
'minimizes' the disturbance in the code base but I'm worried this naming
exception will only confuse though it minimizes change. Why would the primary
not be named like the replica regions?
I don't mind naming the primary regions similar to the replicas. This might
mean tools that currently depend on the name format would break even if the
cluster is not deploying tables with replicas (you guessed that response :-))
But yeah, if you go the full Paxos route, the 'primary' could be anyone in the
replica-set and there it makes sense to have all members in the set to have an
index.
> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>
> Key: HBASE-10070
> URL: https://issues.apache.org/jira/browse/HBASE-10070
> Project: HBase
> Issue Type: New Feature
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
>
>
> In the present HBase architecture, it is hard, probably impossible, to
> satisfy constraints like 99th percentile of the reads will be served under 10
> ms. One of the major factors that affects this is the MTTR for regions. There
> are three phases in the MTTR process - detection, assignment, and recovery.
> Of these, the detection is usually the longest and is presently in the order
> of 20-30 seconds. During this time, the clients would not be able to read the
> region data.
> However, some clients will be better served if regions will be available for
> reads during recovery for doing eventually consistent reads. This will help
> with satisfying low latency guarantees for some class of applications which
> can work with stale reads.
> For improving read availability, we propose a replicated read-only region
> serving design, also referred as secondary regions, or region shadows.
> Extending current model of a region being opened for reads and writes in a
> single region server, the region will be also opened for reading in region
> servers. The region server which hosts the region for reads and writes (as in
> current case) will be declared as PRIMARY, while 0 or more region servers
> might be hosting the region as SECONDARY. There may be more than one
> secondary (replica count > 2).
> Will attach a design doc shortly which contains most of the details and some
> thoughts about development approaches. Reviews are more than welcome.
> We also have a proof of concept patch, which includes the master and regions
> server side of changes. Client side changes will be coming soon as well.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)