Thanks, Alex. On Thu, May 19, 2016 at 3:44 PM Oleksandr Petrov <oleksandr.pet...@gmail.com> wrote:
> I think that this article [1] covers most of the concepts (see key > concepts) quite well. > I am not aware of any article that explains the whole process, though. > > Briefly, there are several processes/concepts that are somewhat related to > that subject: token ownership, replica, coordinator and gossip. > Ensuring consistency in small cluster (amount of replica <= amount of > nodes) is more or less straightforward. In this case, when node bootstraps, > it notifies all the replicas, information about that node gets added to > `pending nodes`, all nodes know about the bootstrapping node, as otherwise > streaming would not even start. > Having a coordinator outside of replica for the partition/token you're > querying is a bit more complex, as it involves the knowledge about the > joined node that's distributed over gossip. > > There are two properties that can improve the situation with range > movements: cassandra.consistent.rangemovement > and cassandra.consistent.simultaneousmoves.allow. First one disallows ring > changes in case there's any node in replica is offline. In addition to > that, it makes sure there are no moves within the ring. In that case, if > you're connected to coordinator that's a part of replica, data has to be > placed correctly. The data will be moved and any inconsistencies will be > eventually fixed with a repair (answering your question, there will be no > data lost during this process). > > (I tried to provide information according to my best knowledge, although if > anyone sees something wrong, please indicate accordingly) > > [1] https://dzone.com/articles/introduction-apache-cassandra > > On Thu, May 19, 2016 at 5:58 AM Renjie Liu <liurenjie2...@gmail.com> > wrote: > > > BTW, is there any article explaining the process? I think this will help > us > > understand it better. > > > > On Thu, May 19, 2016 at 11:28 AM Renjie Liu <liurenjie2...@gmail.com> > > wrote: > > > > > Thanks, I'll read the code. > > > > > > On Thu, May 19, 2016 at 11:02 AM Jeff Jirsa < > jeff.ji...@crowdstrike.com> > > > wrote: > > > > > >> > > >> > > > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L731-L754 > > >> > > >> > > >> And > > >> > > >> > > >> > > > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java#L60-L88 > > >> > > >> > > >> > > >> Cassandra keeps a map of joining and leaving nodes, and does extra > > writes > > >> to the appropriate nodes for mutations created after the streaming is > > >> calculated. > > >> > > >> > > >> > > >> On 5/18/16, 7:33 PM, "Renjie Liu" <liurenjie2...@gmail.com> wrote: > > >> > > >> >Hi, cassandra devs: > > >> >I'm learning cassandra and I can understand most of the techniques > > used. > > >> >But I can't understand how cassandra ensures consistency when > > >> >adding/removing a node? It seems that when a node joins the dht ring, > > >> some > > >> >node need to transferring data to the new node using streaming. But > the > > >> >data may still get updated while transferring, so the new node can > > never > > >> >catch up with it. How cassandra handles this? Will cassandra lose > data > > >> >during this process? > > >> >-- > > >> >Liu, Renjie > > >> >Software Engineer, MVAD > > > > > > -- > > > Liu, Renjie > > > Software Engineer, MVAD > > > > > -- > > Liu, Renjie > > Software Engineer, MVAD > > > -- > Alex Petrov > -- Liu, Renjie Software Engineer, MVAD