[ 
https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872683#comment-13872683
 ] 

Enis Soztutar commented on HBASE-10070:
---------------------------------------

bq. Should this be an architectural objective for HBase? Just asking. Our 
inspiration addressed the 99th percentile in a layer above.
I think we should still focus on individual read latencies and try ti minimize 
the jitter. Obviously, things like hdfs quorum reads, etc are helpful in this 
respect, and we also plan to incorporate that kind of work together with this. 
bq. We should work on this for sure. Native zk client immune to JVM pause has 
come up in the past. Would help all around (as per the Vladimir comment above)
Agreed. But MTTR is orthogonal I think. In a region being single-homed world, 
there is no way you can get away without some timeout. We can try to reduce it 
in cases, but a network partition can always happen. 

bq. Radical! Our DNA up to this has been all about giving the application a 
consistent view.
Yep, we are not proposing to change the default semantics, just giving the 
flexibility if the tradeoffs are justifiable on the user side. 

bq. Could this be build as a layer on top of HBase rather than alter HBase core 
with shims on clients and CPs?
I think the most clean way is to bake this into HBase proper. These are some of 
the reasons we went with this instead of proposing a layer above: 
 - Regardless of eventual consistency for writes, Replicated read only tables 
or bulk-load only tables are one of the major design goals for this work as 
well. This can and should be addressed natively by HBase I would argue. The 
eventual consistency work just extends this further on a use case basis. 
 - RPC failover + RPC cancellation is not possible to do from outside (or at 
least easily) 
 - A higher level API cannot easily tap into LB to ensure that region replicas 
are not co-hosted. 

bq. Do you envision this feature being always on? Or can it be disabled? If the 
former (or latter actually), what implications for current read/write paths do 
you see?
The branch adds REGION_REPLICATION which is a per-table conf, and 
get/scan.setConsistency() API which is per request. The write path is not 
affected at all. On the read path, we do a failover (backup) RPC similar to 
http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf.
 

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
>
>
> In the present HBase architecture, it is hard, probably impossible, to 
> satisfy constraints like 99th percentile of the reads will be served under 10 
> ms. One of the major factors that affects this is the MTTR for regions. There 
> are three phases in the MTTR process - detection, assignment, and recovery. 
> Of these, the detection is usually the longest and is presently in the order 
> of 20-30 seconds. During this time, the clients would not be able to read the 
> region data.
> However, some clients will be better served if regions will be available for 
> reads during recovery for doing eventually consistent reads. This will help 
> with satisfying low latency guarantees for some class of applications which 
> can work with stale reads.
> For improving read availability, we propose a replicated read-only region 
> serving design, also referred as secondary regions, or region shadows. 
> Extending current model of a region being opened for reads and writes in a 
> single region server, the region will be also opened for reading in region 
> servers. The region server which hosts the region for reads and writes (as in 
> current case) will be declared as PRIMARY, while 0 or more region servers 
> might be hosting the region as SECONDARY. There may be more than one 
> secondary (replica count > 2).
> Will attach a design doc shortly which contains most of the details and some 
> thoughts about development approaches. Reviews are more than welcome. 
> We also have a proof of concept patch, which includes the master and regions 
> server side of changes. Client side changes will be coming soon as well. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to