Re: Multinode Cassandra and sstableloader

Serega Sheypak Thu, 02 Apr 2015 02:16:55 -0700

So, sstableloader streams a portion of data stored in
/var/lib/cassandra/data/keyspace/table catalog
If we have 3 nodes and RF=3, then only 1/3 of data would be streamed to
other cluster.
Problem is solved.



2015-04-01 12:05 GMT+02:00 Alain RODRIGUEZ <[email protected]>:

> From Michael Laing - posted on the wrong thread :
>
> "We use Alain's solution as well to make major operational revisions.
>
> We have a "red team" and a "blue team in each AWS region, so we just add
> and drop datacenters to get where we want to be.
>
> Pretty simple."
>
> 2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ <[email protected]>:
>
>> IMHO, the most straight forward solution is to add cluster2 as a new DC
>> for mykeyspace and then drop the old DC.
>>
>> That's how we migrated to VPC (AWS) and we love this approach since you
>> don't have to mess with your existing cluster, plus sync is made
>> automatically and you can then drop your old DC safely, when you are sure.
>>
>> I put steps on this ML long time ago:
>> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
>> Also Datastax docs:
>> https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>
>> "get data from cluster1,
>> put it to cluster2
>> wipe cluster1"
>>
>> I would definitely use this method to do this (I actually did already,
>> multiple times).
>>
>> Up to you, I heard once that there is almost as much way of doing
>> operational on Cassandra as the number of operators :). You should go with
>> method you can be confident with. I can assure the one I propose is quite
>> secure.
>>
>> C*heers,
>>
>> Alain
>>
>> 2015-03-31 15:32 GMT+02:00 Serega Sheypak <[email protected]>:
>>
>>> >I have to ask you if you considered doing an Alter keyspace, change RF
>>> The idea is dead simple:
>>> get data from cluster1,
>>> put it to cluster2
>>> vipe cluster1
>>>
>>> I understand drawbacks of streaming sstableloader approach, I need right
>>> now something easy. Later we consider switch to Priam since it does
>>> backup/restore in a right way.
>>>
>>> 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[email protected]>:
>>>
>>>> Hi,
>>>>
>>>> Despite of "I understand that it's not the best solution, I need it
>>>> for testing purposes", I have to ask you if you considered doing an Alter
>>>> keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
>>>> to add a new DC (your cluster2) ?
>>>>
>>>> In the case you go your way (sstableloader) also advice you to make a
>>>> snapshot (instead of just flushing) to avoid fails due to compactions on
>>>> your active cluster1.
>>>>
>>>> To answer your question, sstableloader is supposed to distribute
>>>> correctly data on the new cluster depending on your RF and topology.
>>>> Basically if you run sstable loader just on sstable c1.node1 my guess
>>>> is that you will have all the data present on c1.node1 stored on the new c2
>>>> (each data to corresponding node). So if you have an RF=3 on c1, you should
>>>> have all the data on c2 just by running sstableloader from c1.node1, if you
>>>> are using RF=1 on c1, then you need to load data from c1.each_node. I
>>>> suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.
>>>>
>>>> I never used the tool, but that's what would be "logical" imho. Wait
>>>> for a confirmation as I wouldn't to lead you to a failure of any kind.
>>>> Also, I don't know if data is also replicated directly with sstableloader
>>>> or if you need to repair c2 after loading data.
>>>>
>>>> C*heers,
>>>>
>>>> Alain
>>>>
>>>> 2015-03-31 13:21 GMT+02:00 Serega Sheypak <[email protected]>:
>>>>
>>>>>  Hi, I have a simple question and can't find related info in docs.
>>>>>
>>>>> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
>>>>> transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
>>>>> using sstableloader. I understand that it's not the best solution, I need
>>>>> it for testing purposes.
>>>>>
>>>>> What I'm going to do:
>>>>>
>>>>>    1. Recreate keyspace schema on cluster2 using schema from cluster1
>>>>>    2. nodetool flush for mykeyspace.source_table being exported from
>>>>>    cluster1 to cluster2
>>>>>    3.
>>>>>
>>>>>    Run sstableloader for each table on cluster1.node01
>>>>>
>>>>>    sstableloader -d cluster2.nodeXXX.com
>>>>>    
>>>>> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>>>>>
>>>>> What should I get as a result on cluster2?
>>>>>
>>>>> *ALL* data from source_table?
>>>>>
>>>>> or
>>>>>
>>>>> Just data stored in *partition of source_table*
>>>>>
>>>>> I'm confused. Doc says I just run this command to export table from
>>>>> cluster1 to cluster2, but I specify path to a part of source_table data,
>>>>> since other parts of table should be on other nodes.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Multinode Cassandra and sstableloader

Reply via email to