[
https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984783#comment-14984783
]
Jeff Jirsa edited comment on CASSANDRA-7306 at 12/11/15 7:49 AM:
-----------------------------------------------------------------
I've implemented some of this, primarily for my own education (learning some of
the internals of gossip better). I've approached this by creating a pluggable
IDatacenterTopologyProvider, and implemented a full mesh, file-based whitelist,
and file-based-blacklist provider. I then extended Gossiper to filter it's list
of live endpoitns by calling the IDatacenterTopologyProvider instance to filter
non-gossipable endpoints, which seems to fit the goal of this ticket. This
enables not only hub/spoke, but arbitrary graphs of database connectivity.
However, the ticket is pretty poorly defined in terms of behaviors.
[~tupshin] , This ticket title mentions "more flexible gossip" - does this
carry into requests/CL as well? What's the desired/expected behavior if a KS
uses NTS to have rf=3 in dcs a,b, and c, but hosts in dc=b are set not to
gossip with hosts in dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with
a+b, and writes just assume all nodes in c are down? Or should it be smart
enough to know that c is disconnected, and not count hosts in c towards
quorum/ALL ?
-My primary hangup is finding the right way to notify the KS replication
strategy to reload if the list of of whitelisted/blacklisted DCs changes. I
know it's a solvable problem, but if it's out of scope, I won't waste time with
it. I realize that this is a {{ponies}} ticket, and there's a ton of
bike-shed/ponies opportunity here, but if we can get some consensus on
definition, I can try to get this to a point where it can potentially be ready
for real review-
was (Author: jjirsa):
I've implemented some of this, primarily for my own education (learning some of
the internals of gossip better). I've approached this by creating a pluggable
IDatacenterTopologyProvider, and implemented a full mesh, file-based whitelist,
and file-based-blacklist provider. I then extended Gossiper to filter it's list
of live endpoitns by calling the IDatacenterTopologyProvider instance to filter
non-gossipable endpoints, which seems to fit the goal of this ticket. This
enables not only hub/spoke, but arbitrary graphs of database connectivity.
However, the ticket is pretty poorly defined in terms of behaviors.
[~tupshin] , This ticket title mentions "more flexible gossip" - does this
carry into requests/CL as well? What's the desired/expected behavior if a KS
uses NTS to have rf=3 in dcs a,b, and c, but hosts in dc=b are set not to
gossip with hosts in dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with
a+b, and writes just assume all nodes in c are down? Or should it be smart
enough to know that c is disconnected, and not count hosts in c towards
quorum/ALL ?
My primary hangup is finding the right way to notify the KS replication
strategy to reload if the list of of whitelisted/blacklisted DCs changes. I
know it's a solvable problem, but if it's out of scope, I won't waste time with
it. I realize that this is a {{ponies}} ticket, and there's a ton of
bike-shed/ponies opportunity here, but if we can get some consensus on
definition, I can try to get this to a point where it can potentially be ready
for real review.
> Support "edge dcs" with more flexible gossip
> --------------------------------------------
>
> Key: CASSANDRA-7306
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7306
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Tupshin Harper
> Labels: ponies
>
> As Cassandra clusters get bigger and bigger, and their topology becomes more
> complex, there is more and more need for a notion of "hub" and "spoke"
> datacenters.
> One of the big obstacles to supporting hundreds (or thousands) of remote dcs,
> is the assumption that all dcs need to talk to each other (and be connected
> all the time).
> This ticket is a vague placeholder with the goals of achieving:
> 1) better behavioral support for occasionally disconnected datacenters
> 2) explicit support for custom dc to dc routing. A simple approach would be
> an optional per-dc annotation of which other DCs that DC could gossip with.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)