KALYAN CHAKRAVARTHY KANCHARLA created CASSANDRA-14927:
---------------------------------------------------------

             Summary: During data migration from 7 node to 21 node cluster 
using sstableloader, new data is being populated on the new tables & data is 
being duplicated on user type tables 
                 Key: CASSANDRA-14927
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14927
             Project: Cassandra
          Issue Type: Task
            Reporter: KALYAN CHAKRAVARTHY KANCHARLA
             Fix For: 2.1.13


I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 DC) 
cluster using sstableloader.

We have same versions on both old and new clusters.

*cqlsh 5.0.1* 

 *Cassandra 2.1.13* 

 *CQL spec 3.2.1* 

Old and New clusters are in different networks. So we opened the following 
ports between them.

7000- storage port
7001- ssl storage port
7199- JMX port
9042- client port
9160- Thrift client port

We use vnodes in the clusters.

We made sure cassandra.yaml file on the new cluster is set correct by changing 
following options,

 

{{cluster_name: 'MyCassandraCluster' }}

{{num_tokens: 256 }}

{{seed_provider: - }}

{{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}

{{parameters: - }}

{{seeds: "10.168.66.41,10.176.170.59" }}

{{listen_address: localhost}}

{{endpoint_snitch: GossipingPropertyFileSnitch}}

And also changes in cassaandra-rackdc-properties for each DC by specifying 
respective DC and rack.

while creating keyspaces, changed Replication to NetworkTopologyStratagy.

 

cluster looks healthy, all the node are UP and NORMAL. 

 

{color:#FF0000}*I was able to get the data from old cluster to new cluster. 
But, along with the data from old cluster, I see some new rows being populated 
in the tables on new cluster and data is being duplicated in the tables with 
user type*. {color}

{color:#333333}We have used the following steps to migrate data:{color}
 # Took snapshorts for all the keyspaces that we want to migrate. (9 
keyspaces). Used the _nodetool snapshot_ command on source nodes to take 
snapshot of required keyspace/table by specifying _hostname, jmx port_ and 
_keyspace_
 __ 

_/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat 
/a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company 
-c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_  *_-h localhost 
-p 7199 snapshot keyspace_name_*

 # After taking snapshots, move these snapshot directory from source nodes to 
target node.
       
→ Create a tar file on source node for the snapshot directory that we want to 
move on to target node.
     tar -cvf file.tar snapshot_name
→ Move this file.tar from source node to local machine.
     scp -S gwsh [email protected]:/a/cassandra/data/file.tar .
→ Now move this file.tar from local machine to a new directory(example: test) 
in the target node.
    scp -S gwsh file.tar [email protected]:/a/cassandra/data/test/.
 # Now untar this file.tar in test directory in target node.
 # The path of the sstables must be same in both source and target.
 # To bulk load these files using _sstableloader,run sstableloader on source 
node, indicate one or more nodes in the destination Cluster with -d flag, which 
can accept comma-separated list of IP addresses or hostnames, and specify the 
path to  sstables in the source node._ __ 

_/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_*

          *_Example:_*

[/a/cassandra/bin#|mailto:[email protected]:/a/cassandra/bin] 
sstableloader -d 192.168.58.41 -u popps -pw ******* -tf 
org.apache.cassandra.thrift.SSLTransportFactory -ts 
/a/cassandra/ssl/truststore.jks -tspw test123 -ks /a/cassandra/ssl/keystore.jks 
-kspw test123 -f /a/cassandra/conf/cassandra.yaml 
/a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/ 

Summary statistics:
 Connections per host: : 1
 Total files transferred: : 9
 Total bytes transferred: : 1787893
 Total duration (ms): : 2936
 Average transfer rate (MB/s): : 0
 Peak transfer rate (MB/s): : 0

 

Performed these steps on all the tables. And checked the row count in old and 
new tables using CQLSH

cqlsh> SELECT count(*) FROM keyspace.table;

example for a single table:

count on new table: 341

count on old table: 303

 

And we are also able to identify the difference in tables by using 'sdiff' 
command. Followed the following steps:
 * created .txt/.csv files for tables in old and new clusters.
 * compared them using sdiff command   

 

*So I request someone can help me to know the cause behind the population of 
new data in the new tables.*  

Please let me know if you need more info.

PS: After migrating the data for the first time and saw these issues, we have 
TRUNCATED all the tables and DROPPED tables with user 'type' and recreated  the 
dropped tables. And did the same procedure for migrating data again. Still we 
see the same issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to