[ 
https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286861#comment-13286861
 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I've recently had an existential crisis, of sorts, over snapshots. Triggered by 
both Jon's questions and some from Ian Varley, I've started to rethink the goal 
of snapshot. Initially, it was to take a globally consistent view of a single 
table. The question that Ian raised is, "Why are we enforcing stricter 
guarantees for a snapshot than for a scan?" In fact, a globally consistent view 
is something HBase explicitly doesn't support (if you do a put to two different 
tables, you have no real, system level guarantees of consistency). 

So does it really matter if we have an actual point in time? Everything in 
HBase is timestamped, which is considered the source of truth for a given 
Mutation. If we are doing a scan for the state of the table as of 12:15:05, we 
don't know if RS1 is 2 seconds before RS2 - as far as we care, its just the 
state at 12:15:05. 
 
This starts to break down a little bit when doing a Get for the latest version 
on a table. If RS1 is two seconds behind RS2 and we snapshot at 12:15:05, then 
we actually might not see all the change to RS1 in the snapshot. However, this 
doesn't really matter because you still wouldn't see that edit when looking at 
that "time". Things are happening so fast in HBase that the best we really need 
is just a "fuzzy" view of the state of the table.

The upside to this is we can do the snapshot _without taking any downtime_ on 
the table being snapshotted. I already discussed how to do this generally in 
the document, but it will have to be rewritten from the perspective of 
timestamped based snapshots (I'll move it to a google doc until we get a more 
finalized version).

The only problem that has jumped out in multiple discussions of the timestamp 
based approach is that if you are using the timestamp for something other than 
the time (ala Facebook Messages) you might not be able to make use of 
snapshots. At Salesforce, I was planning on abusing timestamps as well, so that 
consideration will be made in the implementation (I'll go over how in another 
post).

TL;DR global consistency doesn't matter for HBase since the timestamp is the 
source of truth - the only question is whether you believe the timestamp or 
not. I would posit that based on the design of HBase it has to be considered a 
source of truth.

I'll respond in a bit with a more detailed design of how timestamp based 
snapshots differ from the point-in-time design, but in everything except how to 
deal with the memstore and WAL, it _exactly the same_. The way to handle the 
memstore was suggested by Ian Varley in that we basically use the memstore 
snapshot stuff with some rejiggering to wait a certain amount of time; for the 
WAL we can just use the meta edits that Jon recommends and that I've at least 
talked about IRL (if not in text).
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has 
> drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to