[ 
https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704213#action_12704213
 ] 

Jonathan Gray commented on HBASE-1295:
--------------------------------------

This looks great Andrew!  Some comments...

- What do we do when there are two identical keys in a KeyValue (row, family, 
column, timestamp) but different values?  That's actually going to be possible 
in 0.20 since you can manually set the stamp, will certainly be possible with 
multi-master replication.  I'm not sure how it's handled now.  Would depend on 
logic in both memcache insertion and more importantly compaction, and then how 
it's handled when reading.
- Everything is now just a KeyValue, so that would be what we send to replicas.
- Thoughts on network partitioning?  I'm assuming you're referring to 
partitioning of replica clusters from one another, not within a cluster right?  
If so, I guess you'd hang on to WALs as long as you could, eventually a 
replicated cluster would go into some secondary mode of needing a full sync 
(when other cluster(s) could no longer hold all WALs, or should we assume hdfs 
will not fill and just flush, so can always resync with WALs?).  (note: 
handling of intra-cluster partitions is virtually impossible because of our 
strong consistency)
- Regarding SCOPE and setting things as local or replicated.  What do you 
suspect the precision/resolution of this would be?  Could i have some tables 
being replicated to some clusters, other tables to others, some to both?
- Would replicas of tables _always_ require identical family settings?  For 
example, I have a cluster of 5 nodes with lots of memory, I want to just 
replicate a single high-volume, high-read table from my primary large cluster.  
But in the small cluster I want to set a TTL of 1 day and also set as 
in-memory.  This is kind of advanced and special but the ability to do things 
like that would be very cool, could definitely see us doing something like it 
were it possible.

I've got a good bit of experience with database replication, did some work in 
the postgres world on WAL shipping.  Let me know how I can help your effort.

I agree on your assessment regarding consistency, etc.  It is clear we should 
be doing an eventual consistency model for replication.  This is one of my 
favorite topics!

One thing that's a bit special is this would make an HBase cluster of clusters 
a "read-your-writes"-style eventual consistency distribution model (with our 
strong consistency, partitioned distribution within each individual cluster a 
la read-your-writes).  That makes a huge difference for us, internally, on many 
of our data systems.  This may be obvious as we're just talking about 
replication here, but something to keep in mind.

> Federated HBase
> ---------------
>
>                 Key: HBASE-1295
>                 URL: https://issues.apache.org/jira/browse/HBASE-1295
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: hbase_repl.2.odp, hbase_repl.2.pdf
>
>
> HBase should consider supporting a federated deployment where someone might 
> have terascale (or beyond) clusters in more than one geography and would want 
> the system to handle replication between the clusters/regions. It would be 
> sweet if HBase had something on the roadmap to sync between replicas out of 
> the box. 
> Consider if rows, columns, or even cells could be scoped: local, or global.
> Then, consider a background task on each cluster that replicates new globally 
> scoped edits to peer clusters. The HBase/Bigtable data model has convenient 
> features (timestamps, multiversioning) such that simple exchange of globally 
> scoped cells would be conflict free and would "just work". Implementation 
> effort here would be in producing an efficient mechanism for collecting up 
> edits from all the HRS and transmitting the edits over the network to peers 
> where they would then be split out to the HRS there. Holding on to the edit 
> trace and tracking it until the remote commits succeed would also be 
> necessary. So, HLog is probably the right place to set up the tee. This would 
> be filtered log shipping, basically.  
> This proposal does not consider transactional tables. For transactional 
> tables, enforcement of global mutation commit ordering would come into the 
> picture if the user  wants the  transaction to span the federation. This 
> should be an optional feature even with transactional tables themselves being 
> optional because of how slow it would be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to