Re: Performance drop of current Java drivers

2020-05-01 Thread Chris Splinter
Hi Matthias,

I have forwarded this to the developers that work on the Java driver and
they will be looking into this first thing next week.

Will circle back here with findings,

Chris

On Fri, May 1, 2020 at 12:28 AM Erick Ramirez 
wrote:

> Matthias, I don't have an answer to your question but I just wanted to
> note that I don't believe the driver contributors actively watch this
> mailing list (I'm happy to be corrected  ) so I'd recommend you
> cross-post in the Java driver channels as well. Cheers!
>


Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate
cmds to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
/home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie 
wrote:

> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
> But we need just a sub dataset of this table not the entirety ...
> ------
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie 
> *Cc :* user@cassandra.apache.org ; Erick
> Ramirez 
> *Objet :* Re: COPY command with where condition
>
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
>
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
> wrote:
>
> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
> basic {
>
>
> contact-points = ["data1com:9042","data2.com:9042"]
>
> request {
> timeout = "200"
> consistency = "LOCAL_ONE"
>
> }
> }
> advanced {
>
> auth-provider {
> class = PlainTextAuthProvider
> username = "superuser"
> password = "mypass"
>
> }
> }
> }
> --
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org 
> *Cc :* Erick Ramirez 
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with Dat

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
What you are seeing there is a standard read timeout, how many rows do you
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie 
wrote:

> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
> basic {
>
>
> contact-points = ["data1com:9042","data2.com:9042"]
>
> request {
> timeout = "200"
> consistency = "LOCAL_ONE"
>
> }
> }
> advanced {
>
> auth-provider {
> class = PlainTextAuthProvider
> username = "superuser"
> password = "mypass"
>
> }
> }
> }
> --
> *De :* Chris Splinter 
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org 
> *Cc :* Erick Ramirez 
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>.
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>


Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
DSBulk has an option that lets you specify the query ( including a WHERE
clause )

See Example 19 in this blog post for details:
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie 
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> --
> *De :* adrien ruffie 
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez ; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> --
> *De :* Erick Ramirez 
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org 
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> .
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie 
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>


Unified DataStax drivers

2020-01-16 Thread Chris Splinter
Hi all,

Last September, Jonathan Ellis announced at ApacheCon NA
 that DataStax was going to unify the
drivers that we develop for Apache Cassandra and DataStax Enterprise into a
single open-source, Apache v2.0 Licensed driver. Yesterday, we released
this new version of the drivers across our C++, C#, Java, Node.js and
Python drivers. See the blog post
 for
links to the source code and documentation.

With this unified driver, we are committing to developing all of our new
functionality in this single driver going forward, available for all
Cassandra users and not just DataStax customers. This means that the
following are now available for all users:


Java: Spring Boot Starter

This starter is currently available in DataStax Labs
,
our goal is to get it into the Spring Boot project. Also of note, Mark
Paluch  and the team that works on Spring
Data Cassandra recently completed their upgrade to the 4.x line of the Java
Driver ( DATACASS-656  ).

Java: Built-in support for Reactive programming

This new version of the Java Driver ( v4.4.0 ) now has an executeReactive
method on CqlSession for those working with Reactive Streams. See the
documentation

for details.

Java, Node.js: New Load Balancing Policy

The Java and Node.js drivers now have a new load balancing policy that uses
the in-flight request count for each node to drive the Power of 2 Choices
 decision
and takes into account the dequeuing rate of the in-flight requests to
avoid slow nodes. In addition, the amount of time that a node has been UP
is also considered when creating the query plan to only send requests to
nodes when they are ready. We are also working to get this into the C++, C#
and Python drivers soon.

Python: Pre-Built Wheels

Previously we only had pre-built wheels for the DSE driver but now they are
available for everyone to use in this new version of the driver ( v3.21.0
).


Along with the bulk loader and Kafka connector
 that we
made available for use with Apache Cassandra in December last year, we hope
that this helps simplify the picture for those that use our drivers.

Best,

Chris


Re: Replication system_distributed

2020-01-10 Thread Chris Splinter
Hi Marcel,

The RF for that keyspace is currently hardcoded, see CASSANDRA-11098
. I am not sure why
your RF switched from 1 to 3 after you restarted a cluster, I tried the
same and it remained at 1 for me.

The tables in that keyspace are used to store history about the repair
operations, having it as RF=3 shouldn't affect the performance of the
repair. See CASSANDRA-5839
 for when / why it
was introduced.

Changing the replication from 3 to 1 for system_distributed is not a good
idea for the same reasons why changing the replication of *any* keyspace to
1 is not a good idea. You lose the ability to query that data if a single
node goes down.

Hope this helps,

Chris

On Wed, Jan 8, 2020 at 1:23 AM Marcel Jakobi  wrote:

> Hi,
>
>
>
> the default definition of the keyspace system_distributed is:
>
> CREATE KEYSPACE system_distributed WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
>
>
>
>
>
> If I understand correctly, every repair information will be replicated on
> three servers in the cluster. I have changed the RF to ‚1‘. Once i stop the
> entire cluster, the replication factor changes again to 3. It seems
> Cassandra wants it to be 3.
>
>
>
> Doesn`t that reduce performance on repair operation?
>
> Why is the RF changed again after restart the cluster?
>
> Are there reasons why you shouldn`t change the replication factor to 1?
>
>
>
> Thanks,
>
> Marcel
>