Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Erick Ramirez
>
> *Thanks but there’s no DSE License.*


FWIW it was announced just before Christmas that both DSBulk (DataStax Bulk
Loader) and the DataStax Apache Kafka connector are now both freely
available to all developers and will work with open-source Apache
Cassandra. For details, see
https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra. Cheers!

On Sat, Jan 18, 2020 at 12:29 AM Ankit Gadhiya 
wrote:

> Thanks but there’s no DSE License.
> Wondering how sstableloader will help as some oh the Keyspace and tables
> names are same. Also how do i sync few system keyspaces.
>
>
> Thanks & Regards,
> Ankit
>
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
>
>> Loader*
>>
>> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>>
>> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>>
>>> DataStax bulk loaded can be an option if data is large.
>>>
>>> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>>>
 If the keyspace already exist, use copy command or sstableloader to
 merge data. If data volume it too big, consider spark or a custom java
 program


 Regards,

 Nitan

 Cell: 510 449 9629

 On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
 wrote:

 
 Any leads on this ?

 — Ankit

 On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
 wrote:

> Hi Arvinder,
>
> Thanks for your response.
>
> Yes - Cluster B already has some data. Tables/KS names are identical ;
> for data - I still haven't got the clarity if it has identical data or no 
> -
> I am assuming no since it's for different customers but need the
> confirmation.
>
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>
>
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon <
> dhillona...@gmail.com> wrote:
>
>> So as I understand, Cluster B already has some data and not an empty
>> cluster.
>>
>> When you say, clusters share same keyspace and table names, do you
>> mean both clusters have identical data on those ks/tables?
>>
>>
>> -Arvi
>>
>> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
>> wrote:
>>
>>> Hello Group,
>>>
>>> I have a requirement in one of the production systems where I need
>>> to be able to migrate entire dataset from Cluster A (Azure Region A) to
>>> Cluster B (Azure Region B).
>>>
>>> Each cluster have 3 Cassandra nodes (RF=3) running used by different
>>> applications. Few of the applications are common is Cluster A and 
>>> Cluster B
>>> thereby sharing same keyspace/table names.
>>> Need suggestion for the best possible migration strategy here
>>> considering - 1. No Application code changes possible - Minor 
>>> config/infra
>>> changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.
>>>
>>> It'd be great to hear ideas from all of you based on your
>>> experiences.
>>>
>>> Cassandra Version - Cassandra 3.0.13 on both sides.
>>> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>>>
>>> *Thanks & Regards,*
>>> *Ankit Gadhiya*
>>>
>>> --
 *Thanks & Regards,*
 *Ankit Gadhiya*

 --
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>


Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Dor Laor
Another option instead of raw sstables is to use the Spark Migrator [1].
It reads a source cluster, can make some transformations (like
table/column naming) and
writes to a target cluster. It's a very convenient tool, OSS and free of charge.

[1] https://github.com/scylladb/scylla-migrator

On Fri, Jan 17, 2020 at 5:31 PM Erick Ramirez  wrote:
>>
>> In terms of speed, the sstableloader should be faster correct?
>> Maybe the DSE BulkLoader finds application when you want a slice of the data 
>> and not the entire cake. Is it correct?
>
>
> There's no real direct comparison because DSBulk is designed for operating on 
> data in CSV or JSON as a replacement for the COPY command. Cheers!
>
> On Sat, Jan 18, 2020 at 6:29 AM Sergio  wrote:
>>
>> Hi everyone,
>>
>> Is the DSE BulkLoader faster than the sstableloader?
>>
>> Sometimes I need to make a cluster snapshot and replicate a Cluster A to a 
>> Cluster B  with fewer performance capabilities but the same data size.
>>
>> In terms of speed, the sstableloader should be faster correct?
>>
>> Maybe the DSE BulkLoader finds application when you want a slice of the data 
>> and not the entire cake. Is it correct?
>>
>> Thanks,
>>
>> Sergio
>>
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Erick Ramirez
>
>
> *In terms of speed, the sstableloader should be faster correct?Maybe the
> DSE BulkLoader finds application when you want a slice of the data and not
> the entire cake. Is it correct?*


There's no real direct comparison because DSBulk is designed for operating
on data in CSV or JSON as a replacement for the COPY command. Cheers!

On Sat, Jan 18, 2020 at 6:29 AM Sergio  wrote:

> Hi everyone,
>
> Is the DSE BulkLoader faster than the sstableloader?
>
> Sometimes I need to make a cluster snapshot and replicate a Cluster A to a
> Cluster B  with fewer performance capabilities but the same data size.
>
> In terms of speed, the sstableloader should be faster correct?
>
> Maybe the DSE BulkLoader finds application when you want a slice of the
> data and not the entire cake. Is it correct?
>
> Thanks,
>
> Sergio
>
>
>


Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Sergio
Hi everyone,

Is the DSE BulkLoader faster than the sstableloader?

Sometimes I need to make a cluster snapshot and replicate a Cluster A to a
Cluster B  with fewer performance capabilities but the same data size.

In terms of speed, the sstableloader should be faster correct?

Maybe the DSE BulkLoader finds application when you want a slice of the
data and not the entire cake. Is it correct?

Thanks,

Sergio


RE: [EXTERNAL] Re: COPY command with where condition

2020-01-17 Thread Durity, Sean R
sstablekeys (in the tools directory?) can extract the actual keys from your 
sstables. You have to run it on each node and then combine and de-dupe the 
final results, but I have used this technique with a query generator to extract 
data more efficiently.


Sean Durity

From: Chris Splinter 
Sent: Friday, January 17, 2020 1:47 PM
To: adrien ruffie 
Cc: user@cassandra.apache.org; Erick Ramirez 
Subject: [EXTERNAL] Re: COPY command with where condition

Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate cmds 
to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are 
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 1 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 2 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE id = 3 AND localisation_id = 208812" -url /home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
I don't really know for the moment in production environment, but for 
developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie mailto:adriennolar...@hotmail.fr>>
Cc : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>; Erick Ramirez 
mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you 
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement 
execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW 
FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?

datastax-java-driver {
basic {


contact-points = ["data1com:9042","data2.com:9042 
[data2.com]"]

request {
timeout = "200"
consistency = "LOCAL_ONE"

}
}
advanced {

auth-provider {
class = PlainTextAuthProvider
username = "superuser"
password = "mypass"

}
}
}

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc : Erick Ramirez mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause 
)

See Example 19 in this blog post for details: 
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading 
[datastax.com]

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?


On 17 Jan 2020, at 14:30 , adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out rows based on some criteria, are not 
supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of 

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
Not sure what you mean by “online” migration. You can load data into the same 
name table in cluster B. If the primary keys match, data will be overwritten 
(effectively, not actually on disk). I think you can pipe the output of a 
dsbulk unload to a dsbulk load and make the data transfer very quick. Your 
clusters are very small, so this probably wouldn’t take long.

How you get the client apps to connect to the correct cluster/stop running/etc. 
is beyond the scope of Cassandra.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 1:05 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra 
cluster few having same keyspace/table names

Hi Sean,

You got all valid points.

Please see my answers below -

1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region 
completely.

2. Cluster names in 'A' and 'B' are different.

3. DSbulk - Is there anyway I can do online migration? - I still need to get 
clarity on whether data for same keyspace/table names can be merged between A 
and B. So 2 cases -  1. If merge is not an issue - I guess DSBulk or 
SSTableloader would be an option? 2. If merge is an issue - I am guessing 
without app code change - this wont be possible ,right?


Thanks & Regards,
Ankit Gadhiya


On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya mailto:ankitgadh...@gmail.com>>
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629

On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hi Arvinder,

Thanks for your response.

Yes - Cluster B 

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate
cmds to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
/home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
wrote:

> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
> But we need just a sub dataset of this table not the entirety ...
> --
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie 
> *Cc :* user@cassandra.apache.org ; Erick
> Ramirez 
> *Objet :* Re: COPY command with where condition
>
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
>
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
> wrote:
>
> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
> basic {
>
>
> contact-points = ["data1com:9042","data2.com:9042"]
>
> request {
> timeout = "200"
> consistency = "LOCAL_ONE"
>
> }
> }
> advanced {
>
> auth-provider {
> class = PlainTextAuthProvider
> username = "superuser"
> password = "mypass"
>
> }
> }
> }
> --
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org 
> *Cc :* Erick Ramirez 
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> .
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to 

RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
I don't really know for the moment in production environment, but for 
developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...

De : Chris Splinter 
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie 
Cc : user@cassandra.apache.org ; Erick Ramirez 

Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you 
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement 
execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW 
FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?

datastax-java-driver {
basic {


contact-points = 
["data1com:9042","data2.com:9042"]

request {
timeout = "200"
consistency = "LOCAL_ONE"

}
}
advanced {

auth-provider {
class = PlainTextAuthProvider
username = "superuser"
password = "mypass"

}
}
}

De : Chris Splinter 
mailto:chris.splinter...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc : Erick Ramirez mailto:flightc...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause 
)

See Example 19 in this blog post for details: 
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?

On 17 Jan 2020, at 14:30 , adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out rows based on some criteria, are not 
supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of 
business data already stored and which don't need to export...




De : adrien ruffie mailto:adriennolar...@hotmail.fr>>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez mailto:flightc...@gmail.com>>; 
user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian

De : Erick Ramirez mailto:flightc...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for 
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked 
with DataStax Enterprise but a few weeks ago, it was made free and works with 
open-source Apache Cassandra. For details, see this 
blogpost. 
Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition 
can be use ?

Because we need to export only several data which must be return by a WHERE 
closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were 
poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian



Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Hi Sean,

You got all valid points.

Please see my answers below -

1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region
completely.

2. Cluster names in 'A' and 'B' are different.

3. DSbulk - Is there anyway I can do online migration? - I still need to
get clarity on whether data for same keyspace/table names can be merged
between A and B. So 2 cases -  1. If merge is not an issue - I guess DSBulk
or SSTableloader would be an option? 2. If merge is an issue - I am
guessing without app code change - this wont be possible ,right?


*Thanks & Regards,*
*Ankit Gadhiya*



On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R 
wrote:

> A couple things to consider:
>
>- A separation of apps into their own clusters is typically a better
>model to avoid later entanglements
>- Dsbulk (1.4.1) is now available for only open source clusters. It is
>a great tool for unloading/loading
>- What data problem are you trying to solve with Cassandra and this
>move to another cluster? If it is high-availability, then trying to get to
>2 DCs would be important. However, I think you will need at least a new
>keyspace if you can’t combine the data from the clusters. Whether this
>requires a code or config change depends on how configurable the developers
>made the connection and query details. (As a side rant: why is it that
>developers will write all kinds of new code, but don’t want to touch
>existing code?)
>- Your migration requirements are quite stringent (“we don’t want to
>change anything, lose anything, or stop anything. Make it happen!”). There
>may be a solution, but you may end up with something even more fragile
>afterwards. I would push back to see what is negotiable.
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Ankit Gadhiya 
> *Sent:* Friday, January 17, 2020 8:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: *URGENT* Migration across different Cassandra
> cluster few having same keyspace/table names
>
>
>
> Hi Upasana,
>
>
>
> Thanks for your response. I’d love to do that as a first strategy but
> since they are both separate clusters , how would I do that? Keyspaces
> already have networktopologystrategy with RF=3.
>
>
>
>
>
> — Ankit
>
>
>
> On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> Did you consider adding Cassandra nodes from cluster B,  into cluster A as
> a different data center ?
>
>
>
> Your keyspace would than be on Network topology data strategy.
>
>
>
> In this case, all data can be synced between both data centers by
> Cassandra using rebalancing.
>
>
>
>
>
> At client/application level you will have to ensure local quorum/ local
> consistency  so that there is no impact on latencies.
>
>
>
> Once you have moved data applications to new cluster , you can then remove
> the old data center (cluster A),  and cluster B would have fresh data.
>
>
>
>
>
>
>
>
>
> On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
> wrote:
>
> Thanks but there’s no DSE License.
>
> Wondering how sstableloader will help as some oh the Keyspace and tables
> names are same. Also how do i sync few system keyspaces.
>
>
>
>
>
> Thanks & Regards,
>
> Ankit
>
>
>
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
>
> Loader*
>
>
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
> [datastax.com]
> 
>
>
>
> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>
> DataStax bulk loaded can be an option if data is large.
>
>
>
> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>
> If the keyspace already exist, use copy command or sstableloader to merge
> data. If data volume it too big, consider spark or a custom java program
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
>
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
> wrote:
>
> 
>
> Any leads on this ?
>
>
>
> — Ankit
>
>
>
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
> wrote:
>
> Hi Arvinder,
>
>
>
> Thanks for your response.
>
>
>
> Yes - Cluster B already has some data. Tables/KS names are identical ; for
> data - I still haven't got the clarity if it has identical data or no - I
> am assuming no since it's for different customers but need the confirmation.
>
>
>
> *Thanks & Regards,*
>
> *Ankit Gadhiya*
>
>
>
>
>
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
> wrote:
>
> So as I understand, Cluster B already has some data and not an empty
> cluster.
>
>
>
> When you say, clusters share same keyspace and table names, do you mean
> both clusters have identical data on those ks/tables?
>
>
>
> -Arvi
>
>
>
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
> wrote:
>
> Hello Group,
>
>
>
> I have a requirement in one of the production systems where I need to be
> able to 

Re: COPY command with where condition

2020-01-17 Thread Michael Shuler

On 1/17/20 9:50 AM, adrien ruffie wrote:

Thank you very much,

  so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * 
FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url 
/home/dump



But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: 
Statement execution failed: SELECT * FROM crt_sensors WHERE site_id = 
208812 ALLOW FILTERING (Cassandra timeout during read query at 
consistency LOCAL_ONE (1 responses were required but only 0 replica 
responded))


but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?


datastax-java-driver {
     basic {


         contact-points = ["data1com:9042","data2.com:9042"]


typo?

mshuler@hana:~$ echo "QUIT" | nc -w 10 data2.com 9042
data2.com [35.208.148.117] 9042 (?) : Connection timed out



         request {
             timeout = "200"
             consistency = "LOCAL_ONE"

         }
     }
     advanced {

         auth-provider {
             class = PlainTextAuthProvider
             username = "superuser"
             password = "mypass"

         }
     }
}


Kind regards,
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
What you are seeing there is a standard read timeout, how many rows do you
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
wrote:

> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
> basic {
>
>
> contact-points = ["data1com:9042","data2.com:9042"]
>
> request {
> timeout = "200"
> consistency = "LOCAL_ONE"
>
> }
> }
> advanced {
>
> auth-provider {
> class = PlainTextAuthProvider
> username = "superuser"
> password = "mypass"
>
> }
> }
> }
> --
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org 
> *Cc :* Erick Ramirez 
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> .
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>


RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM 
probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement 
execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW 
FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work 
correctly. Do you know what is the problem ?

datastax-java-driver {
basic {


contact-points = ["data1com:9042","data2.com:9042"]

request {
timeout = "200"
consistency = "LOCAL_ONE"

}
}
advanced {

auth-provider {
class = PlainTextAuthProvider
username = "superuser"
password = "mypass"

}
}
}

De : Chris Splinter 
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org 
Cc : Erick Ramirez 
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause 
)

See Example 19 in this blog post for details: 
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?

On 17 Jan 2020, at 14:30 , adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out rows based on some criteria, are not 
supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of 
business data already stored and which don't need to export...




De : adrien ruffie mailto:adriennolar...@hotmail.fr>>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez mailto:flightc...@gmail.com>>; 
user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian

De : Erick Ramirez mailto:flightc...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for 
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked 
with DataStax Enterprise but a few weeks ago, it was made free and works with 
open-source Apache Cassandra. For details, see this 
blogpost. 
Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition 
can be use ?

Because we need to export only several data which must be return by a WHERE 
closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were 
poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian



Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Jeff Jirsa
The migration requirements are impossible given the current state of the 
database

You probably can’t join two distinct clusters without app changes and without 
downtime unless you’re very lucky (same cluster name, app using quorum but not 
local quorum, both clusters using NetworkTopologyStrategy, neither app using 
serial reads or writes), and trying to do it with conflicting keyspace and 
table names makes it impossible 

Would just assume this isn’t possible and look for alternate plans, like 
downtime or code changes. 


> On Jan 17, 2020, at 6:40 AM, Durity, Sean R  
> wrote:
> 
> 
> A couple things to consider:
> A separation of apps into their own clusters is typically a better model to 
> avoid later entanglements
> Dsbulk (1.4.1) is now available for only open source clusters. It is a great 
> tool for unloading/loading
> What data problem are you trying to solve with Cassandra and this move to 
> another cluster? If it is high-availability, then trying to get to 2 DCs 
> would be important. However, I think you will need at least a new keyspace if 
> you can’t combine the data from the clusters. Whether this requires a code or 
> config change depends on how configurable the developers made the connection 
> and query details. (As a side rant: why is it that developers will write all 
> kinds of new code, but don’t want to touch existing code?)
> Your migration requirements are quite stringent (“we don’t want to change 
> anything, lose anything, or stop anything. Make it happen!”). There may be a 
> solution, but you may end up with something even more fragile afterwards. I 
> would push back to see what is negotiable.
>  
>  
>  
> Sean Durity – Staff Systems Engineer, Cassandra
>  
> From: Ankit Gadhiya  
> Sent: Friday, January 17, 2020 8:50 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
> few having same keyspace/table names
>  
> Hi Upasana,
>  
> Thanks for your response. I’d love to do that as a first strategy but since 
> they are both separate clusters , how would I do that? Keyspaces already have 
> networktopologystrategy with RF=3.
>  
>  
> — Ankit
>  
> On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com> 
> wrote:
> Hi,
>  
> Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
> different data center ? 
>  
> Your keyspace would than be on Network topology data strategy. 
>  
> In this case, all data can be synced between both data centers by Cassandra 
> using rebalancing.
>  
>  
> At client/application level you will have to ensure local quorum/ local 
> consistency  so that there is no impact on latencies.
>  
> Once you have moved data applications to new cluster , you can then remove 
> the old data center (cluster A),  and cluster B would have fresh data.
>  
>  
>  
>  
> On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya  wrote:
> Thanks but there’s no DSE License.
> Wondering how sstableloader will help as some oh the Keyspace and tables 
> names are same. Also how do i sync few system keyspaces.
>  
>  
> Thanks & Regards,
> Ankit
>  
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
> Loader*
>  
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
> [datastax.com]
>  
> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
> DataStax bulk loaded can be an option if data is large. 
>  
> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
> If the keyspace already exist, use copy command or sstableloader to merge 
> data. If data volume it too big, consider spark or a custom java program 
> 
>  
> Regards,
> Nitan
> Cell: 510 449 9629
> 
> 
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya  wrote:
> 
> 
> Any leads on this ?
>  
> — Ankit
>  
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya  wrote:
> Hi Arvinder,
>  
> Thanks for your response.
>  
> Yes - Cluster B already has some data. Tables/KS names are identical ; for 
> data - I still haven't got the clarity if it has identical data or no - I am 
> assuming no since it's for different customers but need the confirmation.
>  
> Thanks & Regards,
> Ankit Gadhiya
> 
>  
>  
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon  
> wrote:
> So as I understand, Cluster B already has some data and not an empty cluster.
>  
> When you say, clusters share same keyspace and table names, do you mean both 
> clusters have identical data on those ks/tables?
>  
> 
> -Arvi
>  
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya  wrote:
> Hello Group,
>  
> I have a requirement in one of the production systems where I need to be able 
> to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure 
> Region B). 
>  
> Each cluster have 3 Cassandra nodes (RF=3) running used by different 
> applications. Few of the applications are common is Cluster A and Cluster B 
> thereby sharing same keyspace/table names. 
> Need suggestion for the best possible migration strategy here considering - 
> 

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
DSBulk has an option that lets you specify the query ( including a WHERE
clause )

See Example 19 in this blog post for details:
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> .
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>


RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
A couple things to consider:

  *   A separation of apps into their own clusters is typically a better model 
to avoid later entanglements
  *   Dsbulk (1.4.1) is now available for only open source clusters. It is a 
great tool for unloading/loading
  *   What data problem are you trying to solve with Cassandra and this move to 
another cluster? If it is high-availability, then trying to get to 2 DCs would 
be important. However, I think you will need at least a new keyspace if you 
can’t combine the data from the clusters. Whether this requires a code or 
config change depends on how configurable the developers made the connection 
and query details. (As a side rant: why is it that developers will write all 
kinds of new code, but don’t want to touch existing code?)
  *   Your migration requirements are quite stringent (“we don’t want to change 
anything, lose anything, or stop anything. Make it happen!”). There may be a 
solution, but you may end up with something even more fragile afterwards. I 
would push back to see what is negotiable.



Sean Durity – Staff Systems Engineer, Cassandra

From: Ankit Gadhiya 
Sent: Friday, January 17, 2020 8:50 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster 
few having same keyspace/table names

Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since 
they are both separate clusters , how would I do that? Keyspaces already have 
networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma 
<028upasana...@gmail.com> wrote:
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as a 
different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra 
using rebalancing.


At client/application level you will have to ensure local quorum/ local 
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove the 
old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables names 
are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
[datastax.com]

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov 
mailto:vvs...@gmail.com>> wrote:
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program

Regards,
Nitan
Cell: 510 449 9629


On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:

Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hi Arvinder,

Thanks for your response.

Yes - Cluster B already has some data. Tables/KS names are identical ; for data 
- I still haven't got the clarity if it has identical data or no - I am 
assuming no since it's for different customers but need the confirmation.

Thanks & Regards,
Ankit Gadhiya


On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
mailto:dhillona...@gmail.com>> wrote:
So as I understand, Cluster B already has some data and not an empty cluster.

When you say, clusters share same keyspace and table names, do you mean both 
clusters have identical data on those ks/tables?

-Arvi

On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
mailto:ankitgadh...@gmail.com>> wrote:
Hello Group,

I have a requirement in one of the production systems where I need to be able 
to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure 
Region B).

Each cluster have 3 Cassandra nodes (RF=3) running used by different 
applications. Few of the applications are common is Cluster A and Cluster B 
thereby sharing same keyspace/table names.
Need suggestion for the best possible migration strategy here considering - 1. 
No Application code changes possible - Minor config/infra changes can be 
considered. 2. Zero data loss. 3. No/Minimal downtime.

It'd be great to hear ideas from all of you based on your experiences.

Cassandra Version - Cassandra 3.0.13 on both sides.
Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--
Thanks & Regards,
Ankit Gadhiya
--

Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Hi Upasana,

Thanks for your response. I’d love to do that as a first strategy but since
they are both separate clusters , how would I do that? Keyspaces already
have networktopologystrategy with RF=3.


— Ankit

On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com>
wrote:

> Hi,
>
> Did you consider adding Cassandra nodes from cluster B,  into cluster A as
> a different data center ?
>
> Your keyspace would than be on Network topology data strategy.
>
> In this case, all data can be synced between both data centers by
> Cassandra using rebalancing.
>
>
> At client/application level you will have to ensure local quorum/ local
> consistency  so that there is no impact on latencies.
>
> Once you have moved data applications to new cluster , you can then remove
> the old data center (cluster A),  and cluster B would have fresh data.
>
>
>
>
> On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya 
> wrote:
>
>> Thanks but there’s no DSE License.
>> Wondering how sstableloader will help as some oh the Keyspace and tables
>> names are same. Also how do i sync few system keyspaces.
>>
>>
>> Thanks & Regards,
>> Ankit
>>
>> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
>>
>>> Loader*
>>>
>>> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>>>
>>> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>>>
 DataStax bulk loaded can be an option if data is large.

 On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:

> If the keyspace already exist, use copy command or sstableloader to
> merge data. If data volume it too big, consider spark or a custom java
> program
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
> wrote:
>
> 
> Any leads on this ?
>
> — Ankit
>
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
> wrote:
>
>> Hi Arvinder,
>>
>> Thanks for your response.
>>
>> Yes - Cluster B already has some data. Tables/KS names are identical
>> ; for data - I still haven't got the clarity if it has identical data or 
>> no
>> - I am assuming no since it's for different customers but need the
>> confirmation.
>>
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>>
>>
>> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon <
>> dhillona...@gmail.com> wrote:
>>
>>> So as I understand, Cluster B already has some data and not an empty
>>> cluster.
>>>
>>> When you say, clusters share same keyspace and table names, do you
>>> mean both clusters have identical data on those ks/tables?
>>>
>>>
>>> -Arvi
>>>
>>> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
>>> wrote:
>>>
 Hello Group,

 I have a requirement in one of the production systems where I need
 to be able to migrate entire dataset from Cluster A (Azure Region A) to
 Cluster B (Azure Region B).

 Each cluster have 3 Cassandra nodes (RF=3) running used by
 different applications. Few of the applications are common is Cluster 
 A and
 Cluster B thereby sharing same keyspace/table names.
 Need suggestion for the best possible migration strategy here
 considering - 1. No Application code changes possible - Minor 
 config/infra
 changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.

 It'd be great to hear ideas from all of you based on your
 experiences.

 Cassandra Version - Cassandra 3.0.13 on both sides.
 Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

 *Thanks & Regards,*
 *Ankit Gadhiya*

 --
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
> --
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>> --
*Thanks & Regards,*
*Ankit Gadhiya*


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Upasana Sharma
Hi,

Did you consider adding Cassandra nodes from cluster B,  into cluster A as
a different data center ?

Your keyspace would than be on Network topology data strategy.

In this case, all data can be synced between both data centers by Cassandra
using rebalancing.


At client/application level you will have to ensure local quorum/ local
consistency  so that there is no impact on latencies.

Once you have moved data applications to new cluster , you can then remove
the old data center (cluster A),  and cluster B would have fresh data.




On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya  wrote:

> Thanks but there’s no DSE License.
> Wondering how sstableloader will help as some oh the Keyspace and tables
> names are same. Also how do i sync few system keyspaces.
>
>
> Thanks & Regards,
> Ankit
>
> On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:
>
>> Loader*
>>
>> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>>
>> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>>
>>> DataStax bulk loaded can be an option if data is large.
>>>
>>> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>>>
 If the keyspace already exist, use copy command or sstableloader to
 merge data. If data volume it too big, consider spark or a custom java
 program


 Regards,

 Nitan

 Cell: 510 449 9629

 On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
 wrote:

 
 Any leads on this ?

 — Ankit

 On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
 wrote:

> Hi Arvinder,
>
> Thanks for your response.
>
> Yes - Cluster B already has some data. Tables/KS names are identical ;
> for data - I still haven't got the clarity if it has identical data or no 
> -
> I am assuming no since it's for different customers but need the
> confirmation.
>
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>
>
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon <
> dhillona...@gmail.com> wrote:
>
>> So as I understand, Cluster B already has some data and not an empty
>> cluster.
>>
>> When you say, clusters share same keyspace and table names, do you
>> mean both clusters have identical data on those ks/tables?
>>
>>
>> -Arvi
>>
>> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
>> wrote:
>>
>>> Hello Group,
>>>
>>> I have a requirement in one of the production systems where I need
>>> to be able to migrate entire dataset from Cluster A (Azure Region A) to
>>> Cluster B (Azure Region B).
>>>
>>> Each cluster have 3 Cassandra nodes (RF=3) running used by different
>>> applications. Few of the applications are common is Cluster A and 
>>> Cluster B
>>> thereby sharing same keyspace/table names.
>>> Need suggestion for the best possible migration strategy here
>>> considering - 1. No Application code changes possible - Minor 
>>> config/infra
>>> changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.
>>>
>>> It'd be great to hear ideas from all of you based on your
>>> experiences.
>>>
>>> Cassandra Version - Cassandra 3.0.13 on both sides.
>>> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>>>
>>> *Thanks & Regards,*
>>> *Ankit Gadhiya*
>>>
>>> --
 *Thanks & Regards,*
 *Ankit Gadhiya*

 --
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>


Re: COPY command with where condition

2020-01-17 Thread Jean Tremblay
Did you think about using a Materialised View to generate what you want to 
keep, and then use DSBulk to extract the data?

> On 17 Jan 2020, at 14:30 , adrien ruffie  wrote:
> 
> Sorry I come back to a quick question about the bulk loader ...
> 
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader 
> 
> 
> I read this : "Operations such as converting strings to lowercase, arithmetic 
> on input columns, or filtering out rows based on some criteria, are not 
> supported. "
> 
> Consequently, it's still not possible to use a WHERE clause with DSBulk, 
> right ?
> 
> I don't really know how I can do it, in order to don't keep the wholeness of 
> business data already stored and which don't need to export...
> 
> 
> 
> De : adrien ruffie 
> Envoyé : vendredi 17 janvier 2020 11:39
> À : Erick Ramirez ; user@cassandra.apache.org 
> 
> Objet : RE: COPY command with where condition
>  
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
> 
> best regards,
> Adrian
> De : Erick Ramirez 
> Envoyé : vendredi 17 janvier 2020 10:02
> À : user@cassandra.apache.org 
> Objet : Re: COPY command with where condition
>  
> The COPY command doesn't support filtering and it doesn't perform well for 
> large tables.
> 
> Have you considered the DSBulk tool from DataStax? Previously, it only worked 
> with DataStax Enterprise but a few weeks ago, it was made free and works with 
> open-source Apache Cassandra. For details, see this blogpost 
> . Cheers!
> 
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie  > wrote:
> Hello all,
> 
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE 
> condition can be use ?
> 
> Because we need to export only several data which must be return by a WHERE 
> closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which were 
> poorly conceptualized...
> 
> Do you know a means to do that please ?
> 
> Thank all and best regards
> 
> Adrian   



smime.p7s
Description: S/MIME cryptographic signature


RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic 
on input columns, or filtering out rows based on some criteria, are not 
supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of 
business data already stored and which don't need to export...




De : adrien ruffie 
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez ; user@cassandra.apache.org 

Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian

De : Erick Ramirez 
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org 
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for 
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked 
with DataStax Enterprise but a few weeks ago, it was made free and works with 
open-source Apache Cassandra. For details, see this 
blogpost. 
Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition 
can be use ?

Because we need to export only several data which must be return by a WHERE 
closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were 
poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Thanks but there’s no DSE License.
Wondering how sstableloader will help as some oh the Keyspace and tables
names are same. Also how do i sync few system keyspaces.


Thanks & Regards,
Ankit

On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov  wrote:

> Loader*
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:
>
>> DataStax bulk loaded can be an option if data is large.
>>
>> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>>
>>> If the keyspace already exist, use copy command or sstableloader to
>>> merge data. If data volume it too big, consider spark or a custom java
>>> program
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
>>> wrote:
>>>
>>> 
>>> Any leads on this ?
>>>
>>> — Ankit
>>>
>>> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
>>> wrote:
>>>
 Hi Arvinder,

 Thanks for your response.

 Yes - Cluster B already has some data. Tables/KS names are identical ;
 for data - I still haven't got the clarity if it has identical data or no -
 I am assuming no since it's for different customers but need the
 confirmation.

 *Thanks & Regards,*
 *Ankit Gadhiya*



 On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
 wrote:

> So as I understand, Cluster B already has some data and not an empty
> cluster.
>
> When you say, clusters share same keyspace and table names, do you
> mean both clusters have identical data on those ks/tables?
>
>
> -Arvi
>
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
> wrote:
>
>> Hello Group,
>>
>> I have a requirement in one of the production systems where I need to
>> be able to migrate entire dataset from Cluster A (Azure Region A) to
>> Cluster B (Azure Region B).
>>
>> Each cluster have 3 Cassandra nodes (RF=3) running used by different
>> applications. Few of the applications are common is Cluster A and 
>> Cluster B
>> thereby sharing same keyspace/table names.
>> Need suggestion for the best possible migration strategy here
>> considering - 1. No Application code changes possible - Minor 
>> config/infra
>> changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.
>>
>> It'd be great to hear ideas from all of you based on your experiences.
>>
>> Cassandra Version - Cassandra 3.0.13 on both sides.
>> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>>
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>> --
>>> *Thanks & Regards,*
>>> *Ankit Gadhiya*
>>>
>>> --
*Thanks & Regards,*
*Ankit Gadhiya*


RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian

De : Erick Ramirez 
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org 
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for 
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked 
with DataStax Enterprise but a few weeks ago, it was made free and works with 
open-source Apache Cassandra. For details, see this 
blogpost. 
Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition 
can be use ?

Because we need to export only several data which must be return by a WHERE 
closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were 
poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


Re: Cassandra failing with "Local host name unknown" even when specifying IP's for listen and rpc addresses

2020-01-17 Thread Erick Ramirez
FWIW there was a long discussion on ASF Slack about this topic earlier this
week (starting here
) with
driftx, exlt & myself and the recommendation was to make the hostname
resolve locally as best practice. Cheers!

On Wed, Jan 15, 2020 at 8:18 AM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> I am getting below error when i try to start cassandra processes  even
> when specify rcp_address and listen_address.
>
> CompilerOracle: inline 
> org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo 
> (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
> CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.encodeVInt 
> (JI)[B
> Error: Exception thrown by the agent : java.net 
> .MalformedURLException:
>  Local host name unknown: java.net 
> .UnknownHostException:
>  cass-cluster1-844788cc8c-p6lb7: cass-cluster1-844788cc8c-p6lb7: Name does 
> not resolve
> sun.management.AgentConfigurationError: java.net 
> .MalformedURLException:
>  Local host name unknown: java.net 
> .UnknownHostException:
>  cass-cluster1-844788cc8c-p6lb7: cass-cluster1-844788cc8c-p6lb7: Name does 
> not resolve
>   at 
> sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(ConnectorBootstrap.java:480)
>   at sun.management.Agent.startAgent(Agent.java:262)
>   at sun.management.Agent.startAgent(Agent.java:452)
> Caused by: java.net 
> .MalformedURLException:
>  Local host name unknown: java.net 
> .UnknownHostException:
>  cass-cluster1-844788cc8c-p6lb7: cass-cluster1-844788cc8c-p6lb7: Name does 
> not resolve
>   at javax.management.remote.JMXServiceURL.(JMXServiceURL.java:289)
>   at javax.management.remote.JMXServiceURL.(JMXServiceURL.java:253)
>   at 
> sun.management.jmxremote.ConnectorBootstrap.exportMBeanServer(ConnectorBootstrap.java:739)
>   at 
> sun.management.jmxremote.ConnectorBootstrap.startRemoteConnectorServer(ConnectorBootstrap.java:468)
>
> Any idea why C* processes is still trying to resolve hostname even after 
> giving IPs in the config?
>
> It works if i add static host entry in /etc/hosts file but i wanted to 
> understand why it needs to resolve hostsname even after specifying rpc and 
> listen IP addresses.
>
>
> Thanks,
>
> Ram
>
>


Re: COPY command with where condition

2020-01-17 Thread Erick Ramirez
The COPY command doesn't support filtering and it doesn't perform well for
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only
worked with DataStax Enterprise but a few weeks ago, it was made free and
works with open-source Apache Cassandra. For details, see this blogpost
. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
wrote:

> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>