Re: Cluster Maintenance Mishap

2016-10-24 Thread kurt Greaves
On 21 October 2016 at 15:15, Branton Davis 
wrote:

> For example, I forgot to mention until I read your comment, that the
> instances showed as UN (up, normal) instead of UJ (up, joining) while they
> were apparently bootstrapping.


It's likely these nodes were configured as seed nodes, which means they
wouldn't have bootstrapped. In this case it shouldn't have been an issue
after you fixed up the data directories.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Cluster Maintenance Mishap

2016-10-21 Thread Branton Davis
Thanks.  Unfortunately, we lost our system logs during all of this
(had normal logs, but not system) due to an unrelated issue :/

Anyhow, as far as I can tell, we're doing okay.

On Thu, Oct 20, 2016 at 11:18 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

> The easiest way to figure out what happened is to examine the system log.
> It will tell you what happened.  But I’m pretty sure your nodes got new
> tokens during that time.
>
> If you want to get back the data inserted during the 2 hours you could use
> sstableloader to send all the data from the 
> /var/data/cassandra_new/cassandra/*
> folders back into the cluster if you still have it.
>
> -Jeremiah
>
>
>
> On Oct 20, 2016, at 3:58 PM, Branton Davis 
> wrote:
>
> Howdy folks.  I asked some about this in IRC yesterday, but we're looking
> to hopefully confirm a couple of things for our sanity.
>
> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
> volume (where all cassandra data, including the commitlog, is stored) with
> a 2TB volume.  The plan for each node (one at a time) was basically:
>
>- rsync while the node is live (repeated until there were only minor
>differences from new data)
>- stop cassandra on the node
>- rsync again
>- replace the old volume with the new
>- start cassandra
>
> However, there was a bug in the rsync command.  Instead of copying the
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
> the volume swap, there was some behavior that was similar to bootstrapping
> a new node (data started streaming in from other nodes).  But there
> was also some behavior that was similar to a node replacement (nodetool
> status showed the same IP address, but a different host ID).  This
> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
> 500-600GB).
>
> The cluster was in this state for about 2 hours, at which point cassandra
> was stopped on them.  Later, I moved the data from the original volumes
> back into place (so, should be the original state before the operation) and
> started cassandra back up.
>
> Finally, the questions.  We've accepted the potential loss of new data
> within the two hours, but our primary concern now is what was happening
> with the bootstrapping nodes.  Would they have taken on the token ranges
> of the original nodes or acted like new nodes and got new token ranges?  If
> the latter, is it possible that any data moved from the healthy nodes to
> the "new" nodes or would restarting them with the original data (and
> repairing) put the cluster's token ranges back into a normal state?
>
> Hopefully that was all clear.  Thanks in advance for any info!
>
>
>


Re: Cluster Maintenance Mishap

2016-10-21 Thread Branton Davis
It mostly seems so.  The thing that bugs me is that some things acted
like they weren't joining as a normal new node.  For example, I forgot
to mention until I read your comment, that the instances showed as UN
(up, normal) instead of UJ (up, joining) while they were
apparently bootstrapping.

Thanks for the assurance.  I'm thinking (hoping) that we're good.

On Thu, Oct 20, 2016 at 11:24 PM, kurt Greaves  wrote:

>
> On 20 October 2016 at 20:58, Branton Davis 
> wrote:
>
>> Would they have taken on the token ranges of the original nodes or acted
>> like new nodes and got new token ranges?  If the latter, is it possible
>> that any data moved from the healthy nodes to the "new" nodes or
>> would restarting them with the original data (and repairing) put
>> the cluster's token ranges back into a normal state?
>
>
> It sounds like you stopped them before they completed joining. So you
> should have nothing to worry about. If not, you will see them marked as DN
> from other nodes in the cluster. If you did, they wouldn't have assumed the
> token ranges and you shouldn't have any issues.
>
> You can just copy the original data back (including system tables) and
> they should assume their own ranges again, and then you can repair to fix
> any missing replicas.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread kurt Greaves
On 20 October 2016 at 20:58, Branton Davis 
wrote:

> Would they have taken on the token ranges of the original nodes or acted
> like new nodes and got new token ranges?  If the latter, is it possible
> that any data moved from the healthy nodes to the "new" nodes or
> would restarting them with the original data (and repairing) put
> the cluster's token ranges back into a normal state?


It sounds like you stopped them before they completed joining. So you
should have nothing to worry about. If not, you will see them marked as DN
from other nodes in the cluster. If you did, they wouldn't have assumed the
token ranges and you shouldn't have any issues.

You can just copy the original data back (including system tables) and they
should assume their own ranges again, and then you can repair to fix any
missing replicas.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Cluster Maintenance Mishap

2016-10-20 Thread Jeremiah D Jordan
The easiest way to figure out what happened is to examine the system log.  It 
will tell you what happened.  But I’m pretty sure your nodes got new tokens 
during that time.

If you want to get back the data inserted during the 2 hours you could use 
sstableloader to send all the data from the /var/data/cassandra_new/cassandra/* 
folders back into the cluster if you still have it.

-Jeremiah


> On Oct 20, 2016, at 3:58 PM, Branton Davis  wrote:
> 
> Howdy folks.  I asked some about this in IRC yesterday, but we're looking to 
> hopefully confirm a couple of things for our sanity.
> 
> Yesterday, I was performing an operation on a 21-node cluster (vnodes, 
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced 
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB 
> volume (where all cassandra data, including the commitlog, is stored) with a 
> 2TB volume.  The plan for each node (one at a time) was basically:
> rsync while the node is live (repeated until there were only minor 
> differences from new data)
> stop cassandra on the node
> rsync again
> replace the old volume with the new
> start cassandra
> However, there was a bug in the rsync command.  Instead of copying the 
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to 
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after the 
> volume swap, there was some behavior that was similar to bootstrapping a new 
> node (data started streaming in from other nodes).  But there was also some 
> behavior that was similar to a node replacement (nodetool status showed the 
> same IP address, but a different host ID).  This happened with 3 nodes (one 
> from each AZ).  The nodes had received 1.4GB, 1.2GB, and 0.6GB of data 
> (whereas the normal load for a node is around 500-600GB).
> 
> The cluster was in this state for about 2 hours, at which point cassandra was 
> stopped on them.  Later, I moved the data from the original volumes back into 
> place (so, should be the original state before the operation) and started 
> cassandra back up.
> 
> Finally, the questions.  We've accepted the potential loss of new data within 
> the two hours, but our primary concern now is what was happening with the 
> bootstrapping nodes.  Would they have taken on the token ranges of the 
> original nodes or acted like new nodes and got new token ranges?  If the 
> latter, is it possible that any data moved from the healthy nodes to the 
> "new" nodes or would restarting them with the original data (and repairing) 
> put the cluster's token ranges back into a normal state?
> 
> Hopefully that was all clear.  Thanks in advance for any info!



Re: Cluster Maintenance Mishap

2016-10-20 Thread Branton Davis
I guess I'm either not understanding how that answers the question
and/or I've just a done a terrible job at asking it.  I'll sleep on it and
maybe I'll think of a better way to describe it tomorrow ;)

On Thu, Oct 20, 2016 at 8:45 PM, Yabin Meng  wrote:

> I believe you're using VNodes (because token range change doesn't make
> sense for single-token setup unless you change it explicitly). If you
> bootstrap a new node with VNodes, I think the way that the token ranges are
> assigned to the node is random (I'm not 100% sure here, but should be so
> logically). If so, the ownership of the data that each node is responsible
> for will be changed. The part of the data that doesn't belong to the node
> under the new ownership, however, will still be kept on that node.
> Cassandra won't remove it automatically unless you run "nodetool cleanup".
> So to answer your question, I don't think the data have been moved away.
> More likely you have extra duplicate here :
>
> Yabin
>
> On Thu, Oct 20, 2016 at 6:41 PM, Branton Davis  > wrote:
>
>> Thanks for the response, Yabin.  However, if there's an answer to my
>> question here, I'm apparently too dense to see it ;)
>>
>> I understand that, since the system keyspace data was not there, it
>> started bootstrapping.  What's not clear is if they took over the token
>> ranges of the previous nodes or got new token ranges.  I'm mainly
>> concerned about the latter.  We've got the nodes back in place with the
>> original data, but the fear is that some data may have been moved off of
>> other nodes.  I think that this is very unlikely, but I'm just looking for
>> confirmation.
>>
>>
>> On Thursday, October 20, 2016, Yabin Meng  wrote:
>>
>>> Most likely the issue is caused by the fact that when you move the data,
>>> you move the system keyspace data away as well. Meanwhile, due to the error
>>> of data being copied into a different location than what C* is expecting,
>>> when C* starts, it can not find the system metadata info and therefore
>>> tries to start as a fresh new node. If you keep keyspace data in the right
>>> place, you should see all old info. as expected.
>>>
>>> I've seen a few such occurrences from customers. As a best practice, I
>>> would always suggest to totally separate Cassandra application data
>>> directory from system keyspace directory (e.g. they don't share common
>>> parent folder, and such).
>>>
>>> Regards,
>>>
>>> Yabin
>>>
>>> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
 Howdy folks.  I asked some about this in IRC yesterday, but we're
 looking to hopefully confirm a couple of things for our sanity.

 Yesterday, I was performing an operation on a 21-node cluster (vnodes,
 replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
 across 3 AZs on AWS EC2).  The plan was to swap each node's existing
 1TB volume (where all cassandra data, including the commitlog, is stored)
 with a 2TB volume.  The plan for each node (one at a time) was
 basically:

- rsync while the node is live (repeated until there were
only minor differences from new data)
- stop cassandra on the node
- rsync again
- replace the old volume with the new
- start cassandra

 However, there was a bug in the rsync command.  Instead of copying the
 contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
 /var/data/cassandra_new/cassandra.  So, when cassandra was started
 after the volume swap, there was some behavior that was similar to
 bootstrapping a new node (data started streaming in from other nodes).
  But there was also some behavior that was similar to a node
 replacement (nodetool status showed the same IP address, but a
 different host ID).  This happened with 3 nodes (one from each AZ).  The
 nodes had received 1.4GB, 1.2GB, and 0.6GB of data (whereas the normal load
 for a node is around 500-600GB).

 The cluster was in this state for about 2 hours, at which
 point cassandra was stopped on them.  Later, I moved the data from the
 original volumes back into place (so, should be the original state before
 the operation) and started cassandra back up.

 Finally, the questions.  We've accepted the potential loss of new data
 within the two hours, but our primary concern now is what was happening
 with the bootstrapping nodes.  Would they have taken on the token
 ranges of the original nodes or acted like new nodes and got new token
 ranges?  If the latter, is it possible that any data moved from the
 healthy nodes to the "new" nodes or would restarting them with the original
 data (and repairing) put the cluster's token ranges back into a normal
 state?

 Hopefully that was all clear.  Thanks in advance for any info!


Re: Cluster Maintenance Mishap

2016-10-20 Thread Yabin Meng
I believe you're using VNodes (because token range change doesn't make
sense for single-token setup unless you change it explicitly). If you
bootstrap a new node with VNodes, I think the way that the token ranges are
assigned to the node is random (I'm not 100% sure here, but should be so
logically). If so, the ownership of the data that each node is responsible
for will be changed. The part of the data that doesn't belong to the node
under the new ownership, however, will still be kept on that node.
Cassandra won't remove it automatically unless you run "nodetool cleanup".
So to answer your question, I don't think the data have been moved away.
More likely you have extra duplicate here :

Yabin

On Thu, Oct 20, 2016 at 6:41 PM, Branton Davis 
wrote:

> Thanks for the response, Yabin.  However, if there's an answer to my
> question here, I'm apparently too dense to see it ;)
>
> I understand that, since the system keyspace data was not there, it
> started bootstrapping.  What's not clear is if they took over the token
> ranges of the previous nodes or got new token ranges.  I'm mainly
> concerned about the latter.  We've got the nodes back in place with the
> original data, but the fear is that some data may have been moved off of
> other nodes.  I think that this is very unlikely, but I'm just looking for
> confirmation.
>
>
> On Thursday, October 20, 2016, Yabin Meng  wrote:
>
>> Most likely the issue is caused by the fact that when you move the data,
>> you move the system keyspace data away as well. Meanwhile, due to the error
>> of data being copied into a different location than what C* is expecting,
>> when C* starts, it can not find the system metadata info and therefore
>> tries to start as a fresh new node. If you keep keyspace data in the right
>> place, you should see all old info. as expected.
>>
>> I've seen a few such occurrences from customers. As a best practice, I
>> would always suggest to totally separate Cassandra application data
>> directory from system keyspace directory (e.g. they don't share common
>> parent folder, and such).
>>
>> Regards,
>>
>> Yabin
>>
>> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> Howdy folks.  I asked some about this in IRC yesterday, but we're
>>> looking to hopefully confirm a couple of things for our sanity.
>>>
>>> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
>>> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
>>> across 3 AZs on AWS EC2).  The plan was to swap each node's existing
>>> 1TB volume (where all cassandra data, including the commitlog, is stored)
>>> with a 2TB volume.  The plan for each node (one at a time) was
>>> basically:
>>>
>>>- rsync while the node is live (repeated until there were only minor
>>>differences from new data)
>>>- stop cassandra on the node
>>>- rsync again
>>>- replace the old volume with the new
>>>- start cassandra
>>>
>>> However, there was a bug in the rsync command.  Instead of copying the
>>> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
>>> /var/data/cassandra_new/cassandra.  So, when cassandra was started
>>> after the volume swap, there was some behavior that was similar to
>>> bootstrapping a new node (data started streaming in from other nodes).  But
>>> there was also some behavior that was similar to a node replacement
>>> (nodetool status showed the same IP address, but a different host ID).  This
>>> happened with 3 nodes (one from each AZ).  The nodes had received
>>> 1.4GB, 1.2GB, and 0.6GB of data (whereas the normal load for a node is
>>> around 500-600GB).
>>>
>>> The cluster was in this state for about 2 hours, at which
>>> point cassandra was stopped on them.  Later, I moved the data from the
>>> original volumes back into place (so, should be the original state before
>>> the operation) and started cassandra back up.
>>>
>>> Finally, the questions.  We've accepted the potential loss of new data
>>> within the two hours, but our primary concern now is what was happening
>>> with the bootstrapping nodes.  Would they have taken on the token
>>> ranges of the original nodes or acted like new nodes and got new token
>>> ranges?  If the latter, is it possible that any data moved from the
>>> healthy nodes to the "new" nodes or would restarting them with the original
>>> data (and repairing) put the cluster's token ranges back into a normal
>>> state?
>>>
>>> Hopefully that was all clear.  Thanks in advance for any info!
>>>
>>
>>


Re: Cluster Maintenance Mishap

2016-10-20 Thread Branton Davis
Thanks for the response, Yabin.  However, if there's an answer to my
question here, I'm apparently too dense to see it ;)

I understand that, since the system keyspace data was not there, it started
bootstrapping.  What's not clear is if they took over the token ranges of
the previous nodes or got new token ranges.  I'm mainly concerned about the
latter.  We've got the nodes back in place with the original data, but the
fear is that some data may have been moved off of other nodes.  I think
that this is very unlikely, but I'm just looking for confirmation.

On Thursday, October 20, 2016, Yabin Meng  wrote:

> Most likely the issue is caused by the fact that when you move the data,
> you move the system keyspace data away as well. Meanwhile, due to the error
> of data being copied into a different location than what C* is expecting,
> when C* starts, it can not find the system metadata info and therefore
> tries to start as a fresh new node. If you keep keyspace data in the right
> place, you should see all old info. as expected.
>
> I've seen a few such occurrences from customers. As a best practice, I
> would always suggest to totally separate Cassandra application data
> directory from system keyspace directory (e.g. they don't share common
> parent folder, and such).
>
> Regards,
>
> Yabin
>
> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis  > wrote:
>
>> Howdy folks.  I asked some about this in IRC yesterday, but we're
>> looking to hopefully confirm a couple of things for our sanity.
>>
>> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
>> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
>> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
>> volume (where all cassandra data, including the commitlog, is stored) with
>> a 2TB volume.  The plan for each node (one at a time) was basically:
>>
>>- rsync while the node is live (repeated until there were only minor
>>differences from new data)
>>- stop cassandra on the node
>>- rsync again
>>- replace the old volume with the new
>>- start cassandra
>>
>> However, there was a bug in the rsync command.  Instead of copying the
>> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
>> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
>> the volume swap, there was some behavior that was similar to bootstrapping
>> a new node (data started streaming in from other nodes).  But there
>> was also some behavior that was similar to a node replacement (nodetool
>> status showed the same IP address, but a different host ID).  This
>> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
>> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
>> 500-600GB).
>>
>> The cluster was in this state for about 2 hours, at which point cassandra
>> was stopped on them.  Later, I moved the data from the original volumes
>> back into place (so, should be the original state before the operation) and
>> started cassandra back up.
>>
>> Finally, the questions.  We've accepted the potential loss of new data
>> within the two hours, but our primary concern now is what was happening
>> with the bootstrapping nodes.  Would they have taken on the token ranges
>> of the original nodes or acted like new nodes and got new token ranges?  If
>> the latter, is it possible that any data moved from the healthy nodes to
>> the "new" nodes or would restarting them with the original data (and
>> repairing) put the cluster's token ranges back into a normal state?
>>
>> Hopefully that was all clear.  Thanks in advance for any info!
>>
>
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread Yabin Meng
Most likely the issue is caused by the fact that when you move the data,
you move the system keyspace data away as well. Meanwhile, due to the error
of data being copied into a different location than what C* is expecting,
when C* starts, it can not find the system metadata info and therefore
tries to start as a fresh new node. If you keep keyspace data in the right
place, you should see all old info. as expected.

I've seen a few such occurrences from customers. As a best practice, I
would always suggest to totally separate Cassandra application data
directory from system keyspace directory (e.g. they don't share common
parent folder, and such).

Regards,

Yabin

On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis 
wrote:

> Howdy folks.  I asked some about this in IRC yesterday, but we're looking
> to hopefully confirm a couple of things for our sanity.
>
> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
> volume (where all cassandra data, including the commitlog, is stored) with
> a 2TB volume.  The plan for each node (one at a time) was basically:
>
>- rsync while the node is live (repeated until there were only minor
>differences from new data)
>- stop cassandra on the node
>- rsync again
>- replace the old volume with the new
>- start cassandra
>
> However, there was a bug in the rsync command.  Instead of copying the
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
> the volume swap, there was some behavior that was similar to bootstrapping
> a new node (data started streaming in from other nodes).  But there
> was also some behavior that was similar to a node replacement (nodetool
> status showed the same IP address, but a different host ID).  This
> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
> 500-600GB).
>
> The cluster was in this state for about 2 hours, at which point cassandra
> was stopped on them.  Later, I moved the data from the original volumes
> back into place (so, should be the original state before the operation) and
> started cassandra back up.
>
> Finally, the questions.  We've accepted the potential loss of new data
> within the two hours, but our primary concern now is what was happening
> with the bootstrapping nodes.  Would they have taken on the token ranges
> of the original nodes or acted like new nodes and got new token ranges?  If
> the latter, is it possible that any data moved from the healthy nodes to
> the "new" nodes or would restarting them with the original data (and
> repairing) put the cluster's token ranges back into a normal state?
>
> Hopefully that was all clear.  Thanks in advance for any info!
>