Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Rahul Singh
Adding to Stefan's comment. There is a "scylladb" migrator, which uses the spark connector from Datastax, and theoretically can work on any Cassandra compiant DB.. and should not be limited to cassandra to scylla.

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Durity, Sean R
The possibility of a highly available way to do this gives more challenges. I would be weighing the cost of a complex solution vs the possibility of a maintenance window when you stop your app to move the data, then restart. For the straight copy of the data, I am currently enamored with

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Stefan Miklosovic
Hi Leena, as already suggested in my previous email, you could use Apache Spark and Cassandra Spark connector (1). I have checked TTLs and I believe you should especially read this section (2) about TTLs. Seems like thats what you need to do, ttls per row. The workflow would be that you read from

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Leena Ghatpande
Understand, 2nd table would be a better approach. So what would be the best way to copy 70M rows from current table to the 2nd table with ttl set on each record as the first table? From: Durity, Sean R Sent: Wednesday, March 13, 2019 8:17 AM To:

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Durity, Sean R
Correct, there is no current flag. I think there SHOULD be one. From: Dieudonné Madishon NGAYA Sent: Tuesday, March 12, 2019 7:17 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Stefan Miklosovic
Hi Sean, for sure, the best approach would be to create another table which would treat just that specific query. How do I set the flag for not allowing allow filtering in cassandra.yaml? I read a doco and there seems to be nothing about that. Regards On Wed, 13 Mar 2019 at 06:57, Durity, Sean

RE: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Durity, Sean R
If there are 2 access patterns, I would consider having 2 tables. The first one with the ID, which you say is the majority use case. Then have a second table that uses a time-bucket approach as others have suggested: (time bucket, id) as primary key Choose a time bucket (day, week, hour, month,

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Leena Ghatpande
Our data model cannot be like below as you have recommended as majority of the reads need to select the data by the partition key (id) only, not by date. You could remodel your data in such way that you would make primary key like this ((date), hour-minute, id) or ((date, hour-minute), id) By

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-11 Thread Stefan Miklosovic
The query which does not work should be like this, I made a mistake there cqlsh> SELECT * from my_keyspace.my_table where number > 2; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable

Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-11 Thread Stefan Miklosovic
Hi Leena, "We are thinking of creating a new table with a date field as a clustering column to be able to query for date ranges, but partition key to clustering key will be 1-1. Is this a good approach?" If you want to select by some time range here, I am wondering how would making datetime a