RE: unable to restore data from copied data directory

2021-01-04 Thread Manu Chadha
Thanks for the tip on dsbulk. I’ll check that. I agree that the approach of 
folder copy would work only if the topology is the same.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Durity, Sean R<mailto:sean_r_dur...@homedepot.com>
Sent: 04 January 2021 15:23
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: unable to restore data from copied data directory

This may not answer all your questions, but maybe it will help move you further 
along:
- you could copy the data (not system) folders *IF* the clusters match in 
topology. This would include the clusters having the same token range 
assignment(s). And you would have to copy the folders from one original node to 
the exact matching node in the second cluster. [To learn more, read about how 
Cassandra distributes data across the cluster. It will take effort to have 
exact matching clusters]
- If you cannot make an exact match in topology, investigate something like 
dsbulk for moving data in and out of clusters with whatever topology they have. 
This is a much more portable solution.
- I know that teams also do disk snapshots on cloud platforms as one back-up 
solution. They can attach that disk snapshot to a new VM (configured the same 
as the previous one) as needed. I don’t know all the particulars of this 
approach, though.

Sean Durity – Staff Systems Engineer, Cassandra

From: Manu Chadha 
Sent: Saturday, January 2, 2021 4:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: unable to restore data from copied data directory

Thanks. Shall I copy only system-schema folder? I tried copying all the folders 
and could think of the following issues I encountered


  1.  C* didnt’ start because the Cluster name by default is Test Cluster while 
the tables seem to refer to K8ssandra cluster “Saved cluster name k8ssandra != 
configured name Test Cluster”
  2.  Then I got this error – “Cannot start node if snitch's data center 
(datacenter1) differs from previous data center (dc1). Please fix the snitch 
configuration, decommission and rebootstrap this node or use the flag 
-Dcassandra.ignore_dc=true.”
  3.  At one point I also got error about no. of tokens (cannot change the 
number of tokens from 257 to 256).

It seems it is not straightforward that I just copy the folders. Any advice 
please?

Sent from Mail 
[go.microsoft.com]<https://urldefense.com/v3/__https:/go.microsoft.com/fwlink/?LinkId=550986__;!!M-nmYVHPHQ!ai9gYDQx9GefMy2MFnDQ1M78ESN82mrl5cEUatLFj1tid3lqNHXxRCnk4kKd19RO5AevlM0$>
 for Windows 10

From: Jeff Jirsa<mailto:jji...@gmail.com>
Sent: 02 January 2021 20:57
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: unable to restore data from copied data directory



On Jan 2, 2021, at 7:30 AM, Manu Chadha 
mailto:manu.cha...@hotmail.com>> wrote:

Hi

Can I just copy the keyspace folders into new cassandra installation s backup 
and restore strategy? I am trying to do that but it isn’t working.

I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
with data backup and restore. Though K8ssandra uses medusa for data backup and 
restore, I could use it so I thought to test by simply copying/pasting the data 
directory. But I don’t see my data after restore. There could be mistakes in my 
approach so I am not really sure where to look. For example

  1.  K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that 
the data is actually stored somewhere else and not in data directories of 
keyspaces?
  2.  Is there a way to look into the files in data directories of keyspaces to 
check what data is there. Maybe the data isn’t backed up properly.

The steps I did to copy the data are:
GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
container
Go to VM instances -> SSH to the node which is running 
k8ssandra-dc1-default-sts-0 container
Once SSHed, ran  “docker exec -it 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
 /bin/bash”
I noticed that the container has Cassandra :
/opt/cassandra
./opt/cassandra/bin/cassandra
./opt/cassandra/javadoc/org/apache/cassandra
./var/lib/cassandra
./var/log/cassandra

cd opt/cassandra/data/data. There were directories for each keyspace. I assume 
that when taking backups we can take a copy of this data directory. Then once 
we need to restore, we can simply copy them back to new node’s data directory.

Note that I couldn’t run nodetool inside the container (nodetool flush or 
nodetool refresh) due to JMX issue. I don’t know how important it is to run the 
command. There is no traffic running on the systems though.

I copied data directory from OUTSIDE container (from the node) using “docker cp 
container name:src_path dest_path” (eg. docker cp 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/

RE: unable to restore data from copied data directory

2021-01-04 Thread Durity, Sean R
This may not answer all your questions, but maybe it will help move you further 
along:
- you could copy the data (not system) folders *IF* the clusters match in 
topology. This would include the clusters having the same token range 
assignment(s). And you would have to copy the folders from one original node to 
the exact matching node in the second cluster. [To learn more, read about how 
Cassandra distributes data across the cluster. It will take effort to have 
exact matching clusters]
- If you cannot make an exact match in topology, investigate something like 
dsbulk for moving data in and out of clusters with whatever topology they have. 
This is a much more portable solution.
- I know that teams also do disk snapshots on cloud platforms as one back-up 
solution. They can attach that disk snapshot to a new VM (configured the same 
as the previous one) as needed. I don’t know all the particulars of this 
approach, though.

Sean Durity – Staff Systems Engineer, Cassandra

From: Manu Chadha 
Sent: Saturday, January 2, 2021 4:54 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: unable to restore data from copied data directory

Thanks. Shall I copy only system-schema folder? I tried copying all the folders 
and could think of the following issues I encountered


  1.  C* didnt’ start because the Cluster name by default is Test Cluster while 
the tables seem to refer to K8ssandra cluster “Saved cluster name k8ssandra != 
configured name Test Cluster”
  2.  Then I got this error – “Cannot start node if snitch's data center 
(datacenter1) differs from previous data center (dc1). Please fix the snitch 
configuration, decommission and rebootstrap this node or use the flag 
-Dcassandra.ignore_dc=true.”
  3.  At one point I also got error about no. of tokens (cannot change the 
number of tokens from 257 to 256).

It seems it is not straightforward that I just copy the folders. Any advice 
please?

Sent from Mail 
[go.microsoft.com]<https://urldefense.com/v3/__https:/go.microsoft.com/fwlink/?LinkId=550986__;!!M-nmYVHPHQ!ai9gYDQx9GefMy2MFnDQ1M78ESN82mrl5cEUatLFj1tid3lqNHXxRCnk4kKd19RO5AevlM0$>
 for Windows 10

From: Jeff Jirsa<mailto:jji...@gmail.com>
Sent: 02 January 2021 20:57
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: unable to restore data from copied data directory



On Jan 2, 2021, at 7:30 AM, Manu Chadha 
mailto:manu.cha...@hotmail.com>> wrote:

Hi

Can I just copy the keyspace folders into new cassandra installation s backup 
and restore strategy? I am trying to do that but it isn’t working.

I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
with data backup and restore. Though K8ssandra uses medusa for data backup and 
restore, I could use it so I thought to test by simply copying/pasting the data 
directory. But I don’t see my data after restore. There could be mistakes in my 
approach so I am not really sure where to look. For example

  1.  K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that 
the data is actually stored somewhere else and not in data directories of 
keyspaces?
  2.  Is there a way to look into the files in data directories of keyspaces to 
check what data is there. Maybe the data isn’t backed up properly.

The steps I did to copy the data are:
GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
container
Go to VM instances -> SSH to the node which is running 
k8ssandra-dc1-default-sts-0 container
Once SSHed, ran  “docker exec -it 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
 /bin/bash”
I noticed that the container has Cassandra :
/opt/cassandra
./opt/cassandra/bin/cassandra
./opt/cassandra/javadoc/org/apache/cassandra
./var/lib/cassandra
./var/log/cassandra

cd opt/cassandra/data/data. There were directories for each keyspace. I assume 
that when taking backups we can take a copy of this data directory. Then once 
we need to restore, we can simply copy them back to new node’s data directory.

Note that I couldn’t run nodetool inside the container (nodetool flush or 
nodetool refresh) due to JMX issue. I don’t know how important it is to run the 
command. There is no traffic running on the systems though.

I copied data directory from OUTSIDE container (from the node) using “docker cp 
container name:src_path dest_path” (eg. docker cp 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data
 backup/)

Then to transfer the backup directory to cloudshell (the console on web 
browser), I used “gcloud compute scp --recurse 
gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data 
~/K8ssandra_data_backup”
Then I copied from cloudshell to my laptop/workstation, using cloudshell 
editor. This downloaded a tar of the backup (using a download link).

Then I downloaded a new .gz of C*3.11.6  on my laptop. After unzipping it, I 
notic

RE: unable to restore data from copied data directory

2021-01-02 Thread Manu Chadha
Thanks. Shall I copy only system-schema folder? I tried copying all the folders 
and could think of the following issues I encountered


  1.  C* didnt’ start because the Cluster name by default is Test Cluster while 
the tables seem to refer to K8ssandra cluster “Saved cluster name k8ssandra != 
configured name Test Cluster”
  2.  Then I got this error – “Cannot start node if snitch's data center 
(datacenter1) differs from previous data center (dc1). Please fix the snitch 
configuration, decommission and rebootstrap this node or use the flag 
-Dcassandra.ignore_dc=true.”
  3.  At one point I also got error about no. of tokens (cannot change the 
number of tokens from 257 to 256).

It seems it is not straightforward that I just copy the folders. Any advice 
please?

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Jeff Jirsa<mailto:jji...@gmail.com>
Sent: 02 January 2021 20:57
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: unable to restore data from copied data directory




On Jan 2, 2021, at 7:30 AM, Manu Chadha  wrote:

Hi

Can I just copy the keyspace folders into new cassandra installation s backup 
and restore strategy? I am trying to do that but it isn’t working.

I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
with data backup and restore. Though K8ssandra uses medusa for data backup and 
restore, I could use it so I thought to test by simply copying/pasting the data 
directory. But I don’t see my data after restore. There could be mistakes in my 
approach so I am not really sure where to look. For example

  1.  K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that 
the data is actually stored somewhere else and not in data directories of 
keyspaces?
  2.  Is there a way to look into the files in data directories of keyspaces to 
check what data is there. Maybe the data isn’t backed up properly.

The steps I did to copy the data are:
GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
container
Go to VM instances -> SSH to the node which is running 
k8ssandra-dc1-default-sts-0 container
Once SSHed, ran  “docker exec -it 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
 /bin/bash”
I noticed that the container has Cassandra :
/opt/cassandra
./opt/cassandra/bin/cassandra
./opt/cassandra/javadoc/org/apache/cassandra
./var/lib/cassandra
./var/log/cassandra

cd opt/cassandra/data/data. There were directories for each keyspace. I assume 
that when taking backups we can take a copy of this data directory. Then once 
we need to restore, we can simply copy them back to new node’s data directory.

Note that I couldn’t run nodetool inside the container (nodetool flush or 
nodetool refresh) due to JMX issue. I don’t know how important it is to run the 
command. There is no traffic running on the systems though.

I copied data directory from OUTSIDE container (from the node) using “docker cp 
container name:src_path dest_path” (eg. docker cp 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data
 backup/)

Then to transfer the backup directory to cloudshell (the console on web 
browser), I used “gcloud compute scp --recurse 
gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data 
~/K8ssandra_data_backup”
Then I copied from cloudshell to my laptop/workstation, using cloudshell 
editor. This downloaded a tar of the backup (using a download link).

Then I downloaded a new .gz of C*3.11.6  on my laptop. After unzipping it, I 
noticed that it hasn’t got a data directory. I ran C* and noticed that only 
default keyspaces were present. I also noticed that data directory was now 
created. I then stopped C*.

Then I copied contents of backup folder (only keyspace name folders, not all 
folders) in data/data directory of a new Cassandra system which wasn’t running. 
Then I restarted the c* system but I can’t see the data via cqlsh. I can’t see 
the keyspace as well which probably is because I should probably copy system 
and system-* folders. But is it safe to do so? I tried it but landed into 
several issues around cluster name, snitch, data center names etc.

The schemas are stored in system_schema so until / unless you copy that it’s 
not gonna work.

Alternatively you can issue the DDL / CREATE statements on your laptop, it’ll 
make new directories, you can copy the data files into those directories. This 
is your safest and easiest option most of the time



Would the approach of just copy/pasting folder work ?

Thanks
Manu
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10




Re: unable to restore data from copied data directory

2021-01-02 Thread Jeff Jirsa



> On Jan 2, 2021, at 7:30 AM, Manu Chadha  wrote:
> 
> 
> Hi
>  
> Can I just copy the keyspace folders into new cassandra installation s backup 
> and restore strategy? I am trying to do that but it isn’t working.
>  
> I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
> with data backup and restore. Though K8ssandra uses medusa for data backup 
> and restore, I could use it so I thought to test by simply copying/pasting 
> the data directory. But I don’t see my data after restore. There could be 
> mistakes in my approach so I am not really sure where to look. For example
> K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that the 
> data is actually stored somewhere else and not in data directories of 
> keyspaces?
> Is there a way to look into the files in data directories of keyspaces to 
> check what data is there. Maybe the data isn’t backed up properly.
>  
> The steps I did to copy the data are:
> GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
> container
> Go to VM instances -> SSH to the node which is running 
> k8ssandra-dc1-default-sts-0 container
> Once SSHed, ran  “docker exec -it 
> k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
>  /bin/bash”
> I noticed that the container has Cassandra :
> /opt/cassandra
> ./opt/cassandra/bin/cassandra
> ./opt/cassandra/javadoc/org/apache/cassandra
> ./var/lib/cassandra
> ./var/log/cassandra
>  
> cd opt/cassandra/data/data. There were directories for each keyspace. I 
> assume that when taking backups we can take a copy of this data directory. 
> Then once we need to restore, we can simply copy them back to new node’s data 
> directory.
>  
> Note that I couldn’t run nodetool inside the container (nodetool flush or 
> nodetool refresh) due to JMX issue. I don’t know how important it is to run 
> the command. There is no traffic running on the systems though.
>  
> I copied data directory from OUTSIDE container (from the node) using “docker 
> cp container name:src_path dest_path” (eg. docker cp 
> k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data
>  backup/)
>  
> Then to transfer the backup directory to cloudshell (the console on web 
> browser), I used “gcloud compute scp --recurse 
> gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data 
> ~/K8ssandra_data_backup”
> Then I copied from cloudshell to my laptop/workstation, using cloudshell 
> editor. This downloaded a tar of the backup (using a download link).
>  
> Then I downloaded a new .gz of C*3.11.6  on my laptop. After unzipping it, I 
> noticed that it hasn’t got a data directory. I ran C* and noticed that only 
> default keyspaces were present. I also noticed that data directory was now 
> created. I then stopped C*.
>  
> Then I copied contents of backup folder (only keyspace name folders, not all 
> folders) in data/data directory of a new Cassandra system which wasn’t 
> running. Then I restarted the c* system but I can’t see the data via cqlsh. I 
> can’t see the keyspace as well which probably is because I should probably 
> copy system and system-* folders. But is it safe to do so? I tried it but 
> landed into several issues around cluster name, snitch, data center names etc.

The schemas are stored in system_schema so until / unless you copy that it’s 
not gonna work.

Alternatively you can issue the DDL / CREATE statements on your laptop, it’ll 
make new directories, you can copy the data files into those directories. This 
is your safest and easiest option most of the time 

>  
> Would the approach of just copy/pasting folder work ?
>  
> Thanks
> Manu
> Sent from Mail for Windows 10
>  


unable to restore data from copied data directory

2021-01-02 Thread Manu Chadha
Hi

Can I just copy the keyspace folders into new cassandra installation s backup 
and restore strategy? I am trying to do that but it isn’t working.

I am using `K8ssandra` to run my single node C* cluster. I am experimenting 
with data backup and restore. Though K8ssandra uses medusa for data backup and 
restore, I could use it so I thought to test by simply copying/pasting the data 
directory. But I don’t see my data after restore. There could be mistakes in my 
approach so I am not really sure where to look. For example

  1.  K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that 
the data is actually stored somewhere else and not in data directories of 
keyspaces?
  2.  Is there a way to look into the files in data directories of keyspaces to 
check what data is there. Maybe the data isn’t backed up properly.

The steps I did to copy the data are:
GKE cluster-> default-pool  -> found node running k8ssandra-dc1-default-sts-0 
container
Go to VM instances -> SSH to the node which is running 
k8ssandra-dc1-default-sts-0 container
Once SSHed, ran  “docker exec -it 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
 /bin/bash”
I noticed that the container has Cassandra :
/opt/cassandra
./opt/cassandra/bin/cassandra
./opt/cassandra/javadoc/org/apache/cassandra
./var/lib/cassandra
./var/log/cassandra

cd opt/cassandra/data/data. There were directories for each keyspace. I assume 
that when taking backups we can take a copy of this data directory. Then once 
we need to restore, we can simply copy them back to new node’s data directory.

Note that I couldn’t run nodetool inside the container (nodetool flush or 
nodetool refresh) due to JMX issue. I don’t know how important it is to run the 
command. There is no traffic running on the systems though.

I copied data directory from OUTSIDE container (from the node) using “docker cp 
container name:src_path dest_path” (eg. docker cp 
k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data
 backup/)

Then to transfer the backup directory to cloudshell (the console on web 
browser), I used “gcloud compute scp --recurse 
gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data 
~/K8ssandra_data_backup”
Then I copied from cloudshell to my laptop/workstation, using cloudshell 
editor. This downloaded a tar of the backup (using a download link).

Then I downloaded a new .gz of C*3.11.6  on my laptop. After unzipping it, I 
noticed that it hasn’t got a data directory. I ran C* and noticed that only 
default keyspaces were present. I also noticed that data directory was now 
created. I then stopped C*.

Then I copied contents of backup folder (only keyspace name folders, not all 
folders) in data/data directory of a new Cassandra system which wasn’t running. 
Then I restarted the c* system but I can’t see the data via cqlsh. I can’t see 
the keyspace as well which probably is because I should probably copy system 
and system-* folders. But is it safe to do so? I tried it but landed into 
several issues around cluster name, snitch, data center names etc.

Would the approach of just copy/pasting folder work ?

Thanks
Manu
Sent from Mail for Windows 10