KALYAN CHAKRAVARTHY KANCHARLA created CASSANDRA-14927:
---------------------------------------------------------
Summary: During data migration from 7 node to 21 node cluster
using sstableloader, new data is being populated on the new tables & data is
being duplicated on user type tables
Key: CASSANDRA-14927
URL: https://issues.apache.org/jira/browse/CASSANDRA-14927
Project: Cassandra
Issue Type: Task
Reporter: KALYAN CHAKRAVARTHY KANCHARLA
Fix For: 2.1.13
I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 DC)
cluster using sstableloader.
We have same versions on both old and new clusters.
*cqlsh 5.0.1*
*Cassandra 2.1.13*
*CQL spec 3.2.1*
Old and New clusters are in different networks. So we opened the following
ports between them.
7000- storage port
7001- ssl storage port
7199- JMX port
9042- client port
9160- Thrift client port
We use vnodes in the clusters.
We made sure cassandra.yaml file on the new cluster is set correct by changing
following options,
{{cluster_name: 'MyCassandraCluster' }}
{{num_tokens: 256 }}
{{seed_provider: - }}
{{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}
{{parameters: - }}
{{seeds: "10.168.66.41,10.176.170.59" }}
{{listen_address: localhost}}
{{endpoint_snitch: GossipingPropertyFileSnitch}}
And also changes in cassaandra-rackdc-properties for each DC by specifying
respective DC and rack.
while creating keyspaces, changed Replication to NetworkTopologyStratagy.
cluster looks healthy, all the node are UP and NORMAL.
{color:#FF0000}*I was able to get the data from old cluster to new cluster.
But, along with the data from old cluster, I see some new rows being populated
in the tables on new cluster and data is being duplicated in the tables with
user type*. {color}
{color:#333333}We have used the following steps to migrate data:{color}
# Took snapshorts for all the keyspaces that we want to migrate. (9
keyspaces). Used the _nodetool snapshot_ command on source nodes to take
snapshot of required keyspace/table by specifying _hostname, jmx port_ and
_keyspace_
__
_/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat
/a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company
-c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_ *_-h localhost
-p 7199 snapshot keyspace_name_*
# After taking snapshots, move these snapshot directory from source nodes to
target node.
→ Create a tar file on source node for the snapshot directory that we want to
move on to target node.
tar -cvf file.tar snapshot_name
→ Move this file.tar from source node to local machine.
scp -S gwsh [email protected]:/a/cassandra/data/file.tar .
→ Now move this file.tar from local machine to a new directory(example: test)
in the target node.
scp -S gwsh file.tar [email protected]:/a/cassandra/data/test/.
# Now untar this file.tar in test directory in target node.
# The path of the sstables must be same in both source and target.
# To bulk load these files using _sstableloader,run sstableloader on source
node, indicate one or more nodes in the destination Cluster with -d flag, which
can accept comma-separated list of IP addresses or hostnames, and specify the
path to sstables in the source node._ __
_/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_*
*_Example:_*
[/a/cassandra/bin#|mailto:[email protected]:/a/cassandra/bin]
sstableloader -d 192.168.58.41 -u popps -pw ******* -tf
org.apache.cassandra.thrift.SSLTransportFactory -ts
/a/cassandra/ssl/truststore.jks -tspw test123 -ks /a/cassandra/ssl/keystore.jks
-kspw test123 -f /a/cassandra/conf/cassandra.yaml
/a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/
Summary statistics:
Connections per host: : 1
Total files transferred: : 9
Total bytes transferred: : 1787893
Total duration (ms): : 2936
Average transfer rate (MB/s): : 0
Peak transfer rate (MB/s): : 0
Performed these steps on all the tables. And checked the row count in old and
new tables using CQLSH
cqlsh> SELECT count(*) FROM keyspace.table;
example for a single table:
count on new table: 341
count on old table: 303
And we are also able to identify the difference in tables by using 'sdiff'
command. Followed the following steps:
* created .txt/.csv files for tables in old and new clusters.
* compared them using sdiff command
*So I request someone can help me to know the cause behind the population of
new data in the new tables.*
Please let me know if you need more info.
PS: After migrating the data for the first time and saw these issues, we have
TRUNCATED all the tables and DROPPED tables with user 'type' and recreated the
dropped tables. And did the same procedure for migrating data again. Still we
see the same issues.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]