Anoop Sam John commented on HBASE-20003:

What does choosing a specific set of 3 nodes for _all_ region data (we might 
have 20GB or 100GB regions by configuration) mean for the "data gravity" of 
those regions in the event we have lost or have to decommission some or all of 
those nodes? Presumably with default random block placement, and blocks 
significantly smaller than the total region size, the region data is more 
dispersed on the cluster otherwise. 
One thing to note down is that the entire region data size not belong to AEP 
really. On flush the data goes to HDFS with normal block placement policy etc. 
Only the live data in memstores are there in AEP and so smaller data size
What happens when a replica is permanently lost? When the pmem fails. When the 
server catches fire, or at least it's power supply.

What happens when two are lost? Can't assume the primary is the one to survive.


When any replica goes down, this will be opened in a new RS and will get up to 
dated with a flush from primary (then primary) 

> WALLess HBase on Persistent Memory
> ----------------------------------
>                 Key: HBASE-20003
>                 URL: https://issues.apache.org/jira/browse/HBASE-20003
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Major
> This JIRA aims to make use of persistent memory (pmem) technologies in HBase. 
> One such usage is to make the Memstore to reside on pmem. Making a persistent 
> memstore would remove the need for WAL and paves way for a WALLess HBase. 
> The existing region replica feature could be used here and ensure the data 
> written to memstores are synchronously replicated to the replicas and ensure 
> strong consistency of the data. (pipeline model)
> Advantages :
> - Data Availability : Since the data across replicas are consistent 
> (synchronously written) our data is always 100 % available.
> - Lower MTTR : It becomes easier/faster to switch over to the replicas on a 
> primary region failure as there is no WAL replay involved. Building the 
> memstore map data also is much faster than reading the WAL and replaying the 
> WAL.
> - Possibility of bigger memstores : These pmems are designed to have more 
> memory than DRAMs so it would also enable us to have bigger sized memstores 
> which leads to lesser flushes/compaction IO. 
> - Removes the dependency of HDFS on the write path
> Initial PoC has been designed and developed. Testing is underway and we would 
> publish the PoC results along with the design doc sooner. The PoC doc will 
> talk about the design decisions, the libraries considered to work with these 
> pmem devices, pros and cons of those libraries and the performance results.
> Note : Next gen memory technologies using 3DXPoint gives persistent memory 
> feature. Such memory DIMMs are soon to appear in the market. The PoC is done 
> around Intel's ApachePass (AEP)

This message was sent by Atlassian JIRA

Reply via email to