[
https://issues.apache.org/jira/browse/CASSANDRA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KALYAN CHAKRAVARTHY KANCHARLA updated CASSANDRA-14927:
------------------------------------------------------
Issue Type: Test (was: Task)
> During data migration from 7 node to 21 node cluster using sstableloader, new
> data is being populated on the new tables & data is being duplicated on user
> type tables
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-14927
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14927
> Project: Cassandra
> Issue Type: Test
> Reporter: KALYAN CHAKRAVARTHY KANCHARLA
> Priority: Major
> Labels: test
> Fix For: 2.1.13
>
>
> I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3
> DC) cluster using sstableloader.
> We have same versions on both old and new clusters.
> *cqlsh 5.0.1*
> *Cassandra 2.1.13*
> *CQL spec 3.2.1*
> Old and New clusters are in different networks. So we opened the following
> ports between them.
> 7000- storage port
> 7001- ssl storage port
> 7199- JMX port
> 9042- client port
> 9160- Thrift client port
> We use vnodes in the clusters.
> We made sure cassandra.yaml file on the new cluster is set correct by
> changing following options,
>
> {{cluster_name: 'MyCassandraCluster' }}
> {{num_tokens: 256 }}
> {{seed_provider: - }}
> {{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}
> {{parameters: - }}
> {{seeds: "10.168.66.41,10.176.170.59" }}
> {{listen_address: localhost}}
> {{endpoint_snitch: GossipingPropertyFileSnitch}}
> And also changes in cassaandra-rackdc-properties for each DC by specifying
> respective DC and rack.
> while creating keyspaces, changed Replication to NetworkTopologyStratagy.
>
> cluster looks healthy, all the node are UP and NORMAL.
>
> {color:#FF0000}*I was able to get the data from old cluster to new cluster.
> But, along with the data from old cluster, I see some new rows being
> populated in the tables on new cluster and data is being duplicated in the
> tables with user type*. {color}
> {color:#333333}We have used the following steps to migrate data:{color}
> # Took snapshorts for all the keyspaces that we want to migrate. (9
> keyspaces). Used the _nodetool snapshot_ command on source nodes to take
> snapshot of required keyspace/table by specifying _hostname, jmx port_ and
> _keyspace_
> __
> _/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat
> /a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company
> -c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_ *_-h
> localhost -p 7199 snapshot keyspace_name_*
> # After taking snapshots, move these snapshot directory from source nodes to
> target node.
>
> → Create a tar file on source node for the snapshot directory that we want to
> move on to target node.
> tar -cvf file.tar snapshot_name
> → Move this file.tar from source node to local machine.
> scp -S gwsh [email protected]:/a/cassandra/data/file.tar .
> → Now move this file.tar from local machine to a new directory(example: test)
> in the target node.
> scp -S gwsh file.tar [email protected]:/a/cassandra/data/test/.
> # Now untar this file.tar in test directory in target node.
> # The path of the sstables must be same in both source and target.
> # To bulk load these files using _sstableloader,run sstableloader on source
> node, indicate one or more nodes in the destination Cluster with -d flag,
> which can accept comma-separated list of IP addresses or hostnames, and
> specify the path to sstables in the source node._ __
> _/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_*
> *_Example:_*
> [/a/cassandra/bin#|mailto:[email protected]:/a/cassandra/bin]
> sstableloader -d 192.168.58.41 -u popps -pw ******* -tf
> org.apache.cassandra.thrift.SSLTransportFactory -ts
> /a/cassandra/ssl/truststore.jks -tspw test123 -ks
> /a/cassandra/ssl/keystore.jks -kspw test123 -f
> /a/cassandra/conf/cassandra.yaml
> /a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/
> Summary statistics:
> Connections per host: : 1
> Total files transferred: : 9
> Total bytes transferred: : 1787893
> Total duration (ms): : 2936
> Average transfer rate (MB/s): : 0
> Peak transfer rate (MB/s): : 0
>
> Performed these steps on all the tables. And checked the row count in old and
> new tables using CQLSH
> cqlsh> SELECT count(*) FROM keyspace.table;
> example for a single table:
> count on new table: 341
> count on old table: 303
>
> And we are also able to identify the difference in tables by using 'sdiff'
> command. Followed the following steps:
> * created .txt/.csv files for tables in old and new clusters.
> * compared them using sdiff command
>
> *So I request someone can help me to know the cause behind the population of
> new data in the new tables.*
> Please let me know if you need more info.
> PS: After migrating the data for the first time and saw these issues, we have
> TRUNCATED all the tables and DROPPED tables with user 'type' and recreated
> the dropped tables. And did the same procedure for migrating data again.
> Still we see the same issues.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]