Adding to Stefan's comment. There is a "scylladb" migrator, which uses the
spark connector from Datastax, and theoretically can work on any Cassandra
compiant DB.. and should not be limited to cassandra to scylla.
The possibility of a highly available way to do this gives more challenges. I
would be weighing the cost of a complex solution vs the possibility of a
maintenance window when you stop your app to move the data, then restart.
For the straight copy of the data, I am currently enamored with
Hi Leena,
as already suggested in my previous email, you could use Apache Spark and
Cassandra Spark connector (1). I have checked TTLs and I believe you should
especially read this section (2) about TTLs. Seems like thats what you need
to do, ttls per row. The workflow would be that you read from
Understand, 2nd table would be a better approach. So what would be the best way
to copy 70M rows from current table to the 2nd table with ttl set on each
record as the first table?
From: Durity, Sean R
Sent: Wednesday, March 13, 2019 8:17 AM
To:
Correct, there is no current flag. I think there SHOULD be one.
From: Dieudonné Madishon NGAYA
Sent: Tuesday, March 12, 2019 7:17 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another
table within the same cluster when COPY is not an
Hi Sean,
for sure, the best approach would be to create another table which would
treat just that specific query.
How do I set the flag for not allowing allow filtering in cassandra.yaml? I
read a doco and there seems to be nothing about that.
Regards
On Wed, 13 Mar 2019 at 06:57, Durity, Sean
If there are 2 access patterns, I would consider having 2 tables. The first one
with the ID, which you say is the majority use case. Then have a second table
that uses a time-bucket approach as others have suggested:
(time bucket, id) as primary key
Choose a time bucket (day, week, hour, month,
Our data model cannot be like below as you have recommended as majority of the
reads need to select the data by the partition key (id) only, not by date.
You could remodel your data in such way that you would make primary key like
this
((date), hour-minute, id)
or
((date, hour-minute), id)
By
The query which does not work should be like this, I made a mistake there
cqlsh> SELECT * from my_keyspace.my_table where number > 2;
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data filtering and
thus may have unpredictable
Hi Leena,
"We are thinking of creating a new table with a date field as a clustering
column to be able to query for date ranges, but partition key to clustering
key will be 1-1. Is this a good approach?"
If you want to select by some time range here, I am wondering how would
making datetime a
10 matches
Mail list logo