ramkrishna.s.vasudevan commented on HBASE-20003:
Thanks for the discussion and points [~apurtell].
Other than the above important point that Anoop mentioned here about how the
WAL replay would get back a data which was actually not sent back to the
client because the RS went down, another important aspect in this replica
pipeline is that - We define the pipeline which is fixed like primary ->
secondary-> tertiary. (as how the read replicas are available now).
So any mutation goes through this pipeline. Pipeline is not randomly generated.
The case of having 3 replicas and either 1 or 2 replicas are brought back, I
think it falls under the following categories
-> The mutation was not applied to the 2 replicas (and only primary is alive).
The mutation itself would be a failure. Because we apply in the reverse order.
If replica is a failure we won't write to primary also
-> The mutation got applied to the 2 replicas but primary went down. So it
falls under the current day WAL scenario that Anoop mentioned.
-> The mutation got applied to the 3rd replica but 2nd replica went down. So
still it means the mutation is a failure so retry that. But here again if the
primary is down before retrying - still we bring back 3rd as new primary
(switch over) which is actually alive at that time so it still falls back to
the current day WAL case because we are actually not sure if the mutation was
successful or not. So it will get back the latest data.
But yes we need to have some remote pmem accessing also if the servers are
> WALLess HBase on Persistent Memory
> Key: HBASE-20003
> URL: https://issues.apache.org/jira/browse/HBASE-20003
> Project: HBase
> Issue Type: New Feature
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Priority: Major
> This JIRA aims to make use of persistent memory (pmem) technologies in HBase.
> One such usage is to make the Memstore to reside on pmem. Making a persistent
> memstore would remove the need for WAL and paves way for a WALLess HBase.
> The existing region replica feature could be used here and ensure the data
> written to memstores are synchronously replicated to the replicas and ensure
> strong consistency of the data. (pipeline model)
> Advantages :
> -Data Availability : Since the data across replicas are consistent
> (synchronously written) our data is always 100 % available.
> -Lower MTTR : It becomes easier/faster to switch over to the replicas on a
> primary region failure as there is no WAL replay involved. Building the
> memstore map data also is much faster than reading the WAL and replaying the
> -Possibility of bigger memstores : These pmems are designed to have more
> memory than DRAMs so it would also enable us to have bigger sized memstores
> which leads to lesser flushes/compaction IO.
> -Removes the dependency of HDFS on the write path
> Initial PoC has been designed and developed. Testing is underway and we would
> publish the PoC results along with the design doc sooner. The PoC doc will
> talk about the design decisions, the libraries considered to work with these
> pmem devices, pros and cons of those libraries and the performance results.
> Note : Next gen memory technologies using 3DXPoint gives persistent memory
> feature. Such memory DIMMs are soon to appear in the market. The PoC is done
> around Intel's ApachePass (AEP)
This message was sent by Atlassian JIRA