Re: Removing a disk from JBOD configuration
Excellent! Thank you Jeff. On Mon, Jul 31, 2017 at 10:26 AM, Jeff Jirsawrote: > 3.10 has 6696 in it, so my understanding is you'll probably be fine just > running repair > > > Yes, same risks if you swap drives - before 6696, you want to replace a > whole node if any sstables are damaged or lost (if you do deletes, and if > it hurts you if deleted data comes back to life). > > > -- > Jeff Jirsa > > > On Jul 31, 2017, at 6:41 AM, Ioannis Zafiropoulos > wrote: > > Thank you Jeff for your answer, > > I use RF=3 and our client connect always with QUORUM. So I guess I will be > alright after a repair (?) > Follow up questions, > - It seems that the risks you describing would be the same as if I had > replaced the drive with an new fresh one and run repair, is that correct? > - can I do the reverse procedure in the future, that is, to add a new > drive with the same procedure I described? > > Thanks, > John > > > > On Mon, Jul 31, 2017 at 5:42 AM, Jeff Jirsa wrote: > >> It depends on what consistency level you use for reads/writes, and >> whether you do deletes >> >> The real danger is that there may have been a tombstone on the drive the >> failed covering data on the disks that remain, where the delete happened >> older than gc-grace - if you simple yank the disk, that data will come back >> to life (it's also possible some data temporarily reverts to a previous >> state for some queries, though the reversion can be fixed with nodetool >> repair, the resurrection can't be undone). If you don't do deletes, this is >> not a problem. If there's no danger to you if data comes back to life, then >> you're probably ok as well. >> >> Cassandra-6696 dramatically lowers this risk , if you're using a new >> enough version of Cassandra >> >> >> >> -- >> Jeff Jirsa >> >> >> > On Jul 31, 2017, at 1:49 AM, Ioannis Zafiropoulos >> wrote: >> > >> > Hi All, >> > >> > I have a 7 node cluster (Version 3.10) consisting of 5 disks each in >> JBOD. A few hours ago I had a disk failure on a node. I am wondering if I >> can: >> > >> > - stop Cassandra on that node >> > - remove the disk, physically and from cassandra.yaml >> > - start Cassandra on that node >> > - run repair >> > >> > I mean, is it necessary to replace a failed disk instead of just >> removing it? >> > (assuming that the remaining disks have enough free space) >> > >> > Thank you for your help, >> > John >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >
Re: Removing a disk from JBOD configuration
3.10 has 6696 in it, so my understanding is you'll probably be fine just running repair Yes, same risks if you swap drives - before 6696, you want to replace a whole node if any sstables are damaged or lost (if you do deletes, and if it hurts you if deleted data comes back to life). -- Jeff Jirsa > On Jul 31, 2017, at 6:41 AM, Ioannis Zafiropouloswrote: > > Thank you Jeff for your answer, > > I use RF=3 and our client connect always with QUORUM. So I guess I will be > alright after a repair (?) > Follow up questions, > - It seems that the risks you describing would be the same as if I had > replaced the drive with an new fresh one and run repair, is that correct? > - can I do the reverse procedure in the future, that is, to add a new drive > with the same procedure I described? > > Thanks, > John > > > >> On Mon, Jul 31, 2017 at 5:42 AM, Jeff Jirsa wrote: >> It depends on what consistency level you use for reads/writes, and whether >> you do deletes >> >> The real danger is that there may have been a tombstone on the drive the >> failed covering data on the disks that remain, where the delete happened >> older than gc-grace - if you simple yank the disk, that data will come back >> to life (it's also possible some data temporarily reverts to a previous >> state for some queries, though the reversion can be fixed with nodetool >> repair, the resurrection can't be undone). If you don't do deletes, this is >> not a problem. If there's no danger to you if data comes back to life, then >> you're probably ok as well. >> >> Cassandra-6696 dramatically lowers this risk , if you're using a new enough >> version of Cassandra >> >> >> >> -- >> Jeff Jirsa >> >> >> > On Jul 31, 2017, at 1:49 AM, Ioannis Zafiropoulos >> > wrote: >> > >> > Hi All, >> > >> > I have a 7 node cluster (Version 3.10) consisting of 5 disks each in JBOD. >> > A few hours ago I had a disk failure on a node. I am wondering if I can: >> > >> > - stop Cassandra on that node >> > - remove the disk, physically and from cassandra.yaml >> > - start Cassandra on that node >> > - run repair >> > >> > I mean, is it necessary to replace a failed disk instead of just removing >> > it? >> > (assuming that the remaining disks have enough free space) >> > >> > Thank you for your help, >> > John >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >
Re: Removing a disk from JBOD configuration
I just want to add that we use vnodes=16 if that helps with my questions.. On Mon, Jul 31, 2017 at 9:41 AM, Ioannis Zafiropouloswrote: > Thank you Jeff for your answer, > > I use RF=3 and our client connect always with QUORUM. So I guess I will be > alright after a repair (?) > Follow up questions, > - It seems that the risks you describing would be the same as if I had > replaced the drive with an new fresh one and run repair, is that correct? > - can I do the reverse procedure in the future, that is, to add a new > drive with the same procedure I described? > > Thanks, > John > > > > On Mon, Jul 31, 2017 at 5:42 AM, Jeff Jirsa wrote: > >> It depends on what consistency level you use for reads/writes, and >> whether you do deletes >> >> The real danger is that there may have been a tombstone on the drive the >> failed covering data on the disks that remain, where the delete happened >> older than gc-grace - if you simple yank the disk, that data will come back >> to life (it's also possible some data temporarily reverts to a previous >> state for some queries, though the reversion can be fixed with nodetool >> repair, the resurrection can't be undone). If you don't do deletes, this is >> not a problem. If there's no danger to you if data comes back to life, then >> you're probably ok as well. >> >> Cassandra-6696 dramatically lowers this risk , if you're using a new >> enough version of Cassandra >> >> >> >> -- >> Jeff Jirsa >> >> >> > On Jul 31, 2017, at 1:49 AM, Ioannis Zafiropoulos >> wrote: >> > >> > Hi All, >> > >> > I have a 7 node cluster (Version 3.10) consisting of 5 disks each in >> JBOD. A few hours ago I had a disk failure on a node. I am wondering if I >> can: >> > >> > - stop Cassandra on that node >> > - remove the disk, physically and from cassandra.yaml >> > - start Cassandra on that node >> > - run repair >> > >> > I mean, is it necessary to replace a failed disk instead of just >> removing it? >> > (assuming that the remaining disks have enough free space) >> > >> > Thank you for your help, >> > John >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >
Re: Removing a disk from JBOD configuration
Thank you Jeff for your answer, I use RF=3 and our client connect always with QUORUM. So I guess I will be alright after a repair (?) Follow up questions, - It seems that the risks you describing would be the same as if I had replaced the drive with an new fresh one and run repair, is that correct? - can I do the reverse procedure in the future, that is, to add a new drive with the same procedure I described? Thanks, John On Mon, Jul 31, 2017 at 5:42 AM, Jeff Jirsawrote: > It depends on what consistency level you use for reads/writes, and whether > you do deletes > > The real danger is that there may have been a tombstone on the drive the > failed covering data on the disks that remain, where the delete happened > older than gc-grace - if you simple yank the disk, that data will come back > to life (it's also possible some data temporarily reverts to a previous > state for some queries, though the reversion can be fixed with nodetool > repair, the resurrection can't be undone). If you don't do deletes, this is > not a problem. If there's no danger to you if data comes back to life, then > you're probably ok as well. > > Cassandra-6696 dramatically lowers this risk , if you're using a new > enough version of Cassandra > > > > -- > Jeff Jirsa > > > > On Jul 31, 2017, at 1:49 AM, Ioannis Zafiropoulos > wrote: > > > > Hi All, > > > > I have a 7 node cluster (Version 3.10) consisting of 5 disks each in > JBOD. A few hours ago I had a disk failure on a node. I am wondering if I > can: > > > > - stop Cassandra on that node > > - remove the disk, physically and from cassandra.yaml > > - start Cassandra on that node > > - run repair > > > > I mean, is it necessary to replace a failed disk instead of just > removing it? > > (assuming that the remaining disks have enough free space) > > > > Thank you for your help, > > John > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Removing a disk from JBOD configuration
It depends on what consistency level you use for reads/writes, and whether you do deletes The real danger is that there may have been a tombstone on the drive the failed covering data on the disks that remain, where the delete happened older than gc-grace - if you simple yank the disk, that data will come back to life (it's also possible some data temporarily reverts to a previous state for some queries, though the reversion can be fixed with nodetool repair, the resurrection can't be undone). If you don't do deletes, this is not a problem. If there's no danger to you if data comes back to life, then you're probably ok as well. Cassandra-6696 dramatically lowers this risk , if you're using a new enough version of Cassandra -- Jeff Jirsa > On Jul 31, 2017, at 1:49 AM, Ioannis Zafiropouloswrote: > > Hi All, > > I have a 7 node cluster (Version 3.10) consisting of 5 disks each in JBOD. A > few hours ago I had a disk failure on a node. I am wondering if I can: > > - stop Cassandra on that node > - remove the disk, physically and from cassandra.yaml > - start Cassandra on that node > - run repair > > I mean, is it necessary to replace a failed disk instead of just removing it? > (assuming that the remaining disks have enough free space) > > Thank you for your help, > John > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Removing a disk from JBOD configuration
Hi All, I have a 7 node cluster (Version 3.10) consisting of 5 disks each in JBOD. A few hours ago I had a disk failure on a node. I am wondering if I can: - stop Cassandra on that node - remove the disk, physically and from cassandra.yaml - start Cassandra on that node - run repair I mean, is it necessary to replace a failed disk instead of just removing it? (assuming that the remaining disks have enough free space) Thank you for your help, John