[ 
https://issues.apache.org/jira/browse/HBASE-20003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368381#comment-16368381
 ] 

Andrew Purtell edited comment on HBASE-20003 at 2/17/18 10:09 PM:
------------------------------------------------------------------

{quote}The existing region replica feature could be used here and ensure the 
data written to memstores are synchronously replicated to the replicas and 
ensure strong consistency of the data. (pipeline model)
{quote}
Let me be more precise about what I meant by "pmem doesn't obviate the need for 
a WAL unless it is replicated itself among multiple servers.".I mean the 
availability of the data in pmem needs to match today's data availability with 
the WAL or there is an overall availability loss.

Synchronous replication of edits from one region replica to another is a WAL by 
another name, but instead of the edit stream being available to the entire 
cluster in a replayable form it is limited to the three servers participating 
in the region replication. When all replicas go down at once, we have lost the 
ability to resume service for the affected region(s) on the other available 
hosts, because nobody beyond those replicas has any of the data. On a 1000+ 
node cluster, if you happen to lose 3 of the servers at once (which is more 
likely than you'd like, but the reality of scale ops) there is a good chance 
some regions have become completely unavailable until one of those servers can 
be brought back online. That is different from today, where every single server 
in the cluster has access to region data and WAL data in HDFS and can host the 
affected region(s).

Perhaps the PoC doc quantifies the availability loss? I'd be interested in 
taking a look. I suppose a case could be made that in some ways this matches 
the availability model of HDFS's default block placement policy, although HDFS 
does active mitigation of replica loss via re-replication and blocks are more 
dispersed than region replicas so an analysis is nontrivial.


was (Author: apurtell):
{quote}The existing region replica feature could be used here and ensure the 
data written to memstores are synchronously replicated to the replicas and 
ensure strong consistency of the data. (pipeline model)
{quote}
Let me be more precise about what I meant by "pmem doesn't obviate the need for 
a WAL unless it is replicated itself among multiple servers.".I mean the 
availability of the data in pmem needs to match today's data availability with 
the WAL or there is an overall availability loss.

Synchronous replication of edits from one region replica to another is a WAL by 
another name, but instead of the edit stream being available to the entire 
cluster in a replayable form it is limited to the three servers participating 
in the region replication. When all replicas go down at once, we lose the 
ability to resume service for the affected region(s) on the other available 
hosts.

Perhaps the PoC doc quantifies the availability loss? I'd be interested in 
taking a look.

 

> WALLess HBase on Persistent Memory
> ----------------------------------
>
>                 Key: HBASE-20003
>                 URL: https://issues.apache.org/jira/browse/HBASE-20003
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Major
>
> This JIRA aims to make use of persistent memory (pmem) technologies in HBase. 
> One such usage is to make the Memstore to reside on pmem. Making a persistent 
> memstore would remove the need for WAL and paves way for a WALLess HBase. 
> The existing region replica feature could be used here and ensure the data 
> written to memstores are synchronously replicated to the replicas and ensure 
> strong consistency of the data. (pipeline model)
> Advantages :
> -Data Availability : Since the data across replicas are consistent 
> (synchronously written) our data is always 100 % available.
> -Lower MTTR : It becomes easier/faster to switch over to the replicas on a 
> primary region failure as there is no WAL replay involved. Building the 
> memstore map data also is much faster than reading the WAL and replaying the 
> WAL.
> -Possibility of bigger memstores : These pmems are designed to have more 
> memory than DRAMs so it would also enable us to have bigger sized memstores 
> which leads to lesser flushes/compaction IO. 
> -Removes the dependency of HDFS on the write path
> Initial PoC has been designed and developed. Testing is underway and we would 
> publish the PoC results along with the design doc sooner. The PoC doc will 
> talk about the design decisions, the libraries considered to work with these 
> pmem devices, pros and cons of those libraries and the performance results.
> Note : Next gen memory technologies using 3DXPoint gives persistent memory 
> feature. Such memory DIMMs are soon to appear in the market. The PoC is done 
> around Intel's ApachePass (AEP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to