On Thu, Sep 10, 2015 at 10:05 AM, Atin Mukherjee <amukh...@redhat.com> wrote:
>
>
> On 09/10/2015 01:42 AM, Jeff Darcy wrote:
>> Better get comfortable, everyone, because I might ramble on for a bit.
>>
>> Over the last few days, I've been looking into the issue of how to manage 
>> our own instances of etcd (or something similar) as part of our 4.0 
>> configuration store.  This is highly relevant for GlusterD 2.0, which would 
>> be both a consumer of the service and (possibly) a manager for the daemons 
>> that provide it.  It's also relevant for NSR, which needs a similar kind of 
>> highly-available highly-consistent store for information about terms.  Just 
>> about any other component might be able to take good advantage of such a 
>> facility if it were available, such as DHT 2.0 using it for layout 
>> information, and I encourage anyone working on 4.0 to think about how it can 
>> make other components simpler.  (BTW, Shyam, that's just a hypothetical 
>> example.  Don't take it any more seriously than you want to.)
>>
>> This is not the first time I've looked into this.  During the previous round 
>> of NSR development, I implemented some code to manage etcd daemons from 
>> within GlusterD:
>>
>>     http://review.gluster.org/#/c/8887/
>>
>> That code's junk.  We shouldn't use anything more than small pieces of it.  
>> Among other problems, it nukes the etcd information when a new node joins.  
>> That was fine for what we were doing with NSR at the time, but clearly can't 
>> work in real life.  I've also been looking at the new-ish etcd interfaces 
>> for cluster management:
>>
>>     https://github.com/coreos/etcd/blob/master/Documentation/other_apis.md
>>
>> I'm pretty sure these didn't exist when I was last looking at this stuff, 
>> but I could be wrong.  In any case, they look pretty nice.  Much like our 
>> own "probe" mechanism, it looks like we can start a single-node cluster and 
>> then add others into that cluster by talking to one of the current members.  
>> In fact, that similarity suggests how we might manage our instances of etcd.
>>
>> (1) Each GlusterD *initially* starts its own private instance of etcd.
>>
>> (2) When we probe from a node X to a node Y, the probe message includes 
>> information about X's etcd server(s).
>>
>> (3) Upon receipt of a probe, Y can (depending on a flag) either *use* X's 
>> etcd cluster or *join* it.  Either way, it has to shut down its own one-node 
>> cluster.  In the JOIN case, this implies that X will send the appropriate 
>> etcd command to its local instance (from whence it will be propagated to the 
>> others).
> I've a follow up question here. Could you elaborate the difference
> between *use* & *join*? As you pointed out that in either ways Y's
> configuration shouldn't be taken into considerations, I believe as part
> of peer probing we should clean up Y's configuration (bringing down its
> one node cluster) and then just join to the existing etcd cluster.
> That's the single workflow what I could think of and *use* would also do
> the same thing IMO.

Here's what I think:

With join, the node becomes a "part" of the etcd cluster participating
in leader election, replicating logs and such.

With use, the node could just use the etcd service without becoming a
part of the cluster (just as NSR would *use* etcd to store term
information).

>>
>> (4) Therefore, the CLI/REST interfaces to initiate a probe need an option to 
>> control this join/use flag.  Default should be JOIN for small clusters, 
>> where it's not a problem for all nodes to be etcd servers as well.
> consul/etcd documentation says that the ideal configuration is to have
> 3-5 servers to form the cluster. The way I was thinking about it is
> during peer probe we would check whether cluster has already gotten the
> enough number of servers to form the cluster, if not then consider the
> other end to join as a etcd server otherwise act as a client. Thoughts?
>>
>> (5) For larger clusters, the administrator might start to specify USE 
>> instead of JOIN after a while.  There might also need to be separate 
>> CLI/REST interfaces to toggle this state without any probe involved.
>>
>> (6) For detach/deprobe, we simply undo the things we did in (3).
>>
>> With all of this in place, probes would become one-time exchanges.  There's 
>> no need for GlusterD daemons to keep probing each other when they can just 
>> "check in" with etcd (which is doing something very similar internally).  
>> Instead of constantly sending its own probe/heartbeat messages and keeping 
>> track of which others nodes' messages have been missed, each GlusterD would 
>> simply use its node UUID to create a time-limited key in etcd, and issue 
>> watches on other nodes' keys.  This is not quite as convenient as 
>> ZooKeeper's ephemerals, but it's still a lot better than what we're doing 
>> now.
>>
>> I'd be tempted to implement this myself, but for now it's probably more 
>> important to work on NSR itself and for that I can just use an external etcd 
>> cluster instead.  Maybe later in the 4.0 integration phase, if nobody else 
>> has beaten me to it, I'll take a swing at it.  Until then, does anyone else 
>> have any thoughts on the proposal?
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to