Re: 回复: 回复: tolerate how many nodes down in the cluster
Hi Peng, Racks can be logical (as defined with RAC attribute in Cassandra configuration files) or physical (racks in server rooms). In my view, for leveraging racks in your case, its important to understand the implication of following decisions: 1. Number of distinct logical RACs defined in Cassandra:If you want to leverage RACs optimally for operational efficiencies (like Brooke explained), you need to make sure that logical RACs are ALWAYS equal to RF irrespective of the fact whether physical Racks are equal to or greater than RF. Keeping logical Racks=RF, ensures that nodes allocated to a logical rack have exactly 1 replicas of the entire 100% data set. So, if your have RF=3 and you use QUORUM for read/write, you can bring down ALL nodes allocated to a logical rack for maintenance activity and still achieve 100% availability. This makes operations faster and cuts down the risk involved. For example, imagine taking a Cassandra restart of entire cluster. If one node takes 3 minutes, a rolling restart of 30 nodes would take 90 minutes. But, if you use 3 logical RACs with RF=3 and assign 10 nodes to each logical RAC, you can restart 10 nodes within a RAC simultaneously (of course in off-peak hours so that remaining 20 nodes can take the load). Staring Cassandra on all RACs one by one will just take 9 minutes rather than 90 minutes. If there are any issues during restart/maintenance, you can take all the nodes on a Logical RAC down, fix them and bring them back without affecting availability 2.Number of physical Racks : As per historical data, there are instances when more than one nodes in a physical rack fail together. When you are using VMs, there are three levels instead of two. VMs on a single physical machine are likely to fail together too due to hardware failure. Physical Racks > Physical Machines > VMs Ensure that all VMs on a physical machine map to single logical RAC. If you want to afford failure of physical racks in the server room, you also need to ensure that all physical servers on a physical rack must map to just one logical RAC. This way, you can afford failure of ALL VMs on ALL physical machines mapped to a single logical RAC and still be 100% available. For Example: RF=3 , 6 physical racks, 2 physical servers per physical rack and 3 VMs per physical server.Setup would be- Physical Rack1 = [Physical1 (3 VM) + Physical2 (3 VM) ]= LogicalRAC1Physical Rack2 = [Physical3 (3 VM) + Physical4 (3 VM) ]= LogicalRAC1 Physical Rack3 = [Physical5 (3 VM) + Physical6 (3 VM) ]= LogicalRAC2Physical Rack4 = [Physical7 (3 VM) + Physical8 (3 VM) ]= LogicalRAC2 Physical Rack5 = [Physical9 (3 VM) + Physical10 (3 VM) ]= LogicalRAC3Physical Rack6 = [Physical11 (3 VM) + Physical12 (3 VM) ]= LogicalRAC3 Problem with this approach is scaling. What if you want to add a single physical server? If you do that and allocate it to one existing logical RAC, your cluster wont be balanced properly because the logical RAC to which the server is added will have additional capacity for same data as other two logical RACs.To keep your cluster balanced, you need to add at least 3 physical servers in 3 different physical Racks and assign each physical server to different logical RAC. This is wastage of resources and hard to digest. If you have physical machines < logical RACs, every physical machine may have more than 1 replica. If entire physical machine fails, you will NOT have 100% availability as more than 1 replica may be unavailable. Similarly, if you have physical racks < logical RACs, every physical rack may have more than 1 replica. If entire physical rack fails, you will NOT have 100% availability as more than 1 replica may be unavailable. Coming back to your example: RF=3 per DC (total RF=6), CL=QUORUM, 2 DCs, 6 physical machines, 8 VMs per physical machine: My Recommendation :1. In each DC, assign 3 physical machines in a DC to 3 logical RACs in Cassandra configuration . 2 DCs can have same RAC names as RACs are uniquely identified with their DC names. So, these are 6 different logical RACs (multiple of RF) (i.e. 1 physical machine per logical RAC) 2. Add 6 physical machines (3 physical machines per DC) to scale the cluster and assign every machine to different logical RAC within the DC. This way, even if you have Active-Passive DC setup, you can afford failure of any physical machine or physical rack in Active DC and still ensure 100% availability. You would also achieve operational benefits explained above. In multi-DC setup, you can also choose to do away with RACs and achieve operational benefits by doing maintenance on one entire DC at a time and leveraging other DC to handle client requests during that time. That will make your life simpler. ThanksAnuj Sent from Yahoo Mail on Android On Thu, 27 Jul 2017 at 12:03, kurt greaveswrote: Note that if you use more racks than RF you lose some of the
Re: 回复: 回复: tolerate how many nodes down in the cluster
Note that if you use more racks than RF you lose some of the operational benefit. e.g: you'll still only be able to take out one rack at a time (especially if using vnodes), despite the fact that you have more racks than RF. As Jeff said this may be desirable, but really it comes down to what your physical failure domains are and how/if you plan to scale. As Jeff said, as long as you don't start with # racks < RF you should be fine.
Re: 回复: 回复: tolerate how many nodes down in the cluster
On 2017-07-26 19:38 (-0700), "Peng Xiao" <2535...@qq.com> wrote: > Kurt/All, > > > why the # of racks should be equal to RF? > > For example,we have 2 DCs each 6 machines with RF=3,each machine virtualized > to 8 vms , > can we set 6 racs with RF3? I mean one machine one RAC to avoid hardware > errors or only set 3 racs,1 rac with 2 machines,which is better? > > The guarantee you get from racks is that IF you have more racks than replicas, you won't have 2 replicas on the same rack. There's no requirement that # of racks >= # of replicas, you just leave yourself exposed to losing quorum if you have an outage while # racks < # replicas. Yes, with a rack == a hypervisor, the snitch would avoid placing 2 replicas on the same physical machine, and would protect you against hardware errors. There's nothing to gain from having 3 racks instead of 6 in that case (in fact 6 is probably better, as you're less likely to have to skip a duplicate rack in getNaturalEndpoints()). All of this said: BE REALLY CAREFUL WHEN USING RACKS. If you start with # of racks < RF, and you try to add another rack, you will probably be very unhappy (when you add that first node in the new rack, it'll take 1/RF of the ring instantly, which usually crashes everything). For that reason, a lot of people advise not to use racks unless you have > RF racks, or you REALLY know what you're doing. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
回复: 回复: tolerate how many nodes down in the cluster
as per Brooke suggests,RACs a multipile of RF. https://www.youtube.com/watch?v=QrP7G1eeQTI if we have 6 machines with RF=3,then we can set up 6 RACs or setup 3RACs,which will be better? Could you please further advise? Many thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 晚上7:31 收件人: "user"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: 回复: 回复: tolerate how many nodes down in the cluster One more question.why the # of racks should be equal to RF? For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs with RF3?I mean one machine one RAC. Thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 上午10:32 收件人: "user"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: 回复: 回复: tolerate how many nodes down in the cluster Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves";<k...@instaclustr.com>; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.
回复: 回复: tolerate how many nodes down in the cluster
One more question.why the # of racks should be equal to RF? For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs with RF3?I mean one machine one RAC. Thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 上午10:32 收件人: "user"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: 回复: 回复: tolerate how many nodes down in the cluster Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves";<k...@instaclustr.com>; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.
回复: 回复: tolerate how many nodes down in the cluster
Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves";<k...@instaclustr.com>; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User"<user@cassandra.apache.org>; 抄送: "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.