Re: Error when running nodetool cleanup after adding a new node to a cluster

2017-02-08 Thread Srinath Reddy
Yes, I ran the nodetool cleanup on the other nodes and got the error.

Thanks.

> On 09-Feb-2017, at 11:12 AM, Harikrishnan Pillai  
> wrote:
> 
> The cleanup has to run on other nodes
> 
> Sent from my iPhone
> 
> On Feb 8, 2017, at 9:14 PM, Srinath Reddy  > wrote:
> 
>> Hi,
>> 
>> Trying to re-balacne a Cassandra cluster after adding a new node and I'm 
>> getting this error when running nodetool cleanup. The Cassandra cluster is 
>> running in a Kubernetes cluster.
>> 
>> Cassandra version is 2.2.8
>> 
>> nodetool cleanup
>> error: io.k8s.cassandra.KubernetesSeedProvider
>> Fatal configuration error; unable to start server.  See log for stacktrace.
>> -- StackTrace --
>> org.apache.cassandra.exceptions.ConfigurationException: 
>> io.k8s.cassandra.KubernetesSeedProvider
>> Fatal configuration error; unable to start server.  See log for stacktrace.
>> at 
>> org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:676)
>> at 
>> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:119)
>> at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
>> at 
>> org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:262)
>> at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
>> at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
>> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
>> 
>> nodetool status
>> Datacenter: datacenter1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address  Load   Tokens   Owns (effective)  Host ID   
>> Rack
>> UN  10.244.3.4   6.91 GB256  60.8% 
>> bad1c6c6-8c2e-4f0c-9aea-0d63b451e7a1  rack1
>> UN  10.244.0.3   6.22 GB256  60.2% 
>> 936cb0c0-d14f-4ddd-bfde-3865b922e267  rack1
>> UN  10.244.1.3   6.12 GB256  59.4% 
>> 0cb43711-b155-449c-83ba-00ed2a97affe  rack1
>> UN  10.244.4.3   632.43 MB  256  57.8% 
>> 55095c75-26df-4180-9004-9fabf88faacc  rack1
>> UN  10.244.2.10  6.08 GB256  61.8% 
>> 32e32bd2-364f-4b6f-b13a-8814164ed160  rack1
>> 
>> 
>> Any suggestions on what is needed to re-balance the cluster after adding the 
>> new node? I have run nodetool repair but not able to run nodetool cleanup.
>> 
>> Thanks.
>> 
>> 



signature.asc
Description: Message signed with OpenPGP


Re: Error when running nodetool cleanup after adding a new node to a cluster

2017-02-08 Thread Harikrishnan Pillai
The cleanup has to run on other nodes

Sent from my iPhone

On Feb 8, 2017, at 9:14 PM, Srinath Reddy 
> wrote:

Hi,

Trying to re-balacne a Cassandra cluster after adding a new node and I'm 
getting this error when running nodetool cleanup. The Cassandra cluster is 
running in a Kubernetes cluster.

Cassandra version is 2.2.8

nodetool cleanup
error: io.k8s.cassandra.KubernetesSeedProvider
Fatal configuration error; unable to start server.  See log for stacktrace.
-- StackTrace --
org.apache.cassandra.exceptions.ConfigurationException: 
io.k8s.cassandra.KubernetesSeedProvider
Fatal configuration error; unable to start server.  See log for stacktrace.
at 
org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:676)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:119)
at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
at org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:262)
at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns (effective)  Host ID  
 Rack
UN  10.244.3.4   6.91 GB256  60.8% 
bad1c6c6-8c2e-4f0c-9aea-0d63b451e7a1  rack1
UN  10.244.0.3   6.22 GB256  60.2% 
936cb0c0-d14f-4ddd-bfde-3865b922e267  rack1
UN  10.244.1.3   6.12 GB256  59.4% 
0cb43711-b155-449c-83ba-00ed2a97affe  rack1
UN  10.244.4.3   632.43 MB  256  57.8% 
55095c75-26df-4180-9004-9fabf88faacc  rack1
UN  10.244.2.10  6.08 GB256  61.8% 
32e32bd2-364f-4b6f-b13a-8814164ed160  rack1


Any suggestions on what is needed to re-balance the cluster after adding the 
new node? I have run nodetool repair but not able to run nodetool cleanup.

Thanks.




Error when running nodetool cleanup after adding a new node to a cluster

2017-02-08 Thread Srinath Reddy
Hi,

Trying to re-balacne a Cassandra cluster after adding a new node and I'm 
getting this error when running nodetool cleanup. The Cassandra cluster is 
running in a Kubernetes cluster.

Cassandra version is 2.2.8

nodetool cleanup
error: io.k8s.cassandra.KubernetesSeedProvider
Fatal configuration error; unable to start server.  See log for stacktrace.
-- StackTrace --
org.apache.cassandra.exceptions.ConfigurationException: 
io.k8s.cassandra.KubernetesSeedProvider
Fatal configuration error; unable to start server.  See log for stacktrace.
at 
org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:676)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:119)
at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
at 
org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:262)
at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns (effective)  Host ID  
 Rack
UN  10.244.3.4   6.91 GB256  60.8% 
bad1c6c6-8c2e-4f0c-9aea-0d63b451e7a1  rack1
UN  10.244.0.3   6.22 GB256  60.2% 
936cb0c0-d14f-4ddd-bfde-3865b922e267  rack1
UN  10.244.1.3   6.12 GB256  59.4% 
0cb43711-b155-449c-83ba-00ed2a97affe  rack1
UN  10.244.4.3   632.43 MB  256  57.8% 
55095c75-26df-4180-9004-9fabf88faacc  rack1
UN  10.244.2.10  6.08 GB256  61.8% 
32e32bd2-364f-4b6f-b13a-8814164ed160  rack1


Any suggestions on what is needed to re-balance the cluster after adding the 
new node? I have run nodetool repair but not able to run nodetool cleanup.

Thanks.




signature.asc
Description: Message signed with OpenPGP


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread daemeon reiydelle
your MMV. Think of that storage limit as fairly reasonable for active data
likely to tombstone. Add more for older/historic data. Then think about
time to recover a node.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 8, 2017 at 2:14 PM, Ben Slater 
wrote:

> The major issue we’ve seen with very high density (we generally say <2TB
> node is best) is manageability - if you need to replace a node or add node
> then restreaming data takes a *long* time and there we fairly high chance
> of a glitch in the universe meaning you have to start again before it’s
> done.
>
> Also, if you’re uses STCS you can end up with gigantic compactions which
> also take a long time and can cause issues.
>
> Heap limitations are mainly related to partition size rather than node
> density in my experience.
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 08:20 Hannu Kröger  wrote:
>
>> Hello,
>>
>> Back in the day it was recommended that max disk density per node for
>> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>>
>> IIRC it was mostly because of heap memory limitations? Now that off-heap
>> support is there for certain data and 3.x has different data storage
>> format, is that 3-5TB still a valid limit?
>>
>> Does anyone have experience on running Cassandra with 3-5TB compressed
>> data ?
>>
>> Cheers,
>> Hannu
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread Ben Slater
The major issue we’ve seen with very high density (we generally say <2TB
node is best) is manageability - if you need to replace a node or add node
then restreaming data takes a *long* time and there we fairly high chance
of a glitch in the universe meaning you have to start again before it’s
done.

Also, if you’re uses STCS you can end up with gigantic compactions which
also take a long time and can cause issues.

Heap limitations are mainly related to partition size rather than node
density in my experience.

Cheers
Ben

On Thu, 9 Feb 2017 at 08:20 Hannu Kröger  wrote:

> Hello,
>
> Back in the day it was recommended that max disk density per node for
> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>
> IIRC it was mostly because of heap memory limitations? Now that off-heap
> support is there for certain data and 3.x has different data storage
> format, is that 3-5TB still a valid limit?
>
> Does anyone have experience on running Cassandra with 3-5TB compressed
> data ?
>
> Cheers,
> Hannu

-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Cluster scaling

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi Jan,

Yes, you are right about the batches, I am working on a correction of the way 
we use batches, just like you mentioned. I monitored all those stats and seems 
that hardware is not he bottleneck.
Thank you for the response and advise!

Cheers,
Branislav

From: Jan Kesten 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, February 8, 2017 at 8:20 AM
To: "user@cassandra.apache.org" 
Cc: "j.kes...@enercast.de" 
Subject: Re: Cluster scaling


Hi Branislav,

what is it you would expect?

Some thoughts:

Batches are often misunderstood, they work well only if they contain only one 
partition key - think of a batch of different sensor data to one key. If you 
group batches with many partition keys and/or do large batches this puts high 
load on the coordinator node with then itself needs to talk to the nodes 
holding the partitions. This could explain the scaling you see in your second 
try without batches. Keep in mind that the driver supports executeAsync and 
ResultSetFutures.

Second, put commitlog and data directories on seperate disks when using 
spindles.

Third, have you monitored iostats and cpustats while running your tests?

Cheers,

Jan
Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco):

Hi all,



I have a cluster of three nodes and would like to ask some questions about the 
performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.

Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K



Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K



The average results (operations per second) without the use of batch statement 
are:



Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K



Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.



The code is available here https://github.com/bjanosik/CassandraBenchTool.git . 
It can be built with Maven and then you can use jar in target directory with 
java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .

Thank you for any help.




--

Jan Kesten, mailto:j.kes...@enercast.de

Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68

enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471

http://www.enercast.de Online-Prognosen für erneuerbare Energien

Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)



Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.



This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.


Current data density limits with Open Source Cassandra

2017-02-08 Thread Hannu Kröger
Hello,

Back in the day it was recommended that max disk density per node for Cassandra 
1.2 was at around 3-5TB of uncompressed data. 

IIRC it was mostly because of heap memory limitations? Now that off-heap 
support is there for certain data and 3.x has different data storage format, is 
that 3-5TB still a valid limit?

Does anyone have experience on running Cassandra with 3-5TB compressed data ?

Cheers,
Hannu

Re: Extract big data to file

2017-02-08 Thread Justin Cameron
Actually using BEGINTOKEN and ENDTOKEN will only give you what you want if
you're using ByteOrderedPartitioner (not with the default murmur3). It also
looks like *datetimestamp *is a clustering column so that suggestion
probably wouldn't have applied anyway.

On Wed, 8 Feb 2017 at 13:04 Justin Cameron  wrote:

> Ideally you would have the program/Spark job that receives the data from
> Kafka write it to a text file as it writes each row to Cassandra - that way
> you don't need to query Cassandra at all.
>
> If you need to dump this data ad-hoc, rather than on a regular schedule,
> your best bet is to write some code to do it as Kiril mentioned. A short
> python script would do the job, and you get the added bonus over CQLSH of
> being able to throttle the export if it is very large and likely to affect
> your cluster's performance.
>
> Alternatively, if *datetimestamp* is part of the table's partition key
> you could also use the BEGINTOKEN and ENDTOKEN options of CQLSH's COPY TO
> command to achieve what you want.
>
>
> On Wed, 8 Feb 2017 at 11:40 Kiril Menshikov  wrote:
>
> Did you try to receive data through the code? cqlsh probably not the right
> tool to fetch 360G.
>
>
>
> On Feb 8, 2017, at 12:34, Cogumelos Maravilha 
> wrote:
>
> Hi list,
>
> My database stores data from Kafka. Using C* 3.0.10
>
> In my cluster I'm using:
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> The result of extract one day of data uncompressed is around 360G.
>
> I've find these approaches:
>
> echo "SELECT kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01';" | cqlsh 100.100.221.146 9042 >
> result.txt
> Here by default I get 100 rows.
>
> Using CAPTURE result.csv with paging off I always get the error out of
> memory. With paging on I need to put something heavy in the top of the
> Enter key. Crazy thing need to enable paging to get ride of out of memory!
> I've take a look to the result file and is empty, perhaps is cooking the
> result in memory to in the end past to disk.
>
> Is there another approach like this on ACID databases:
> copy (select kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01') to 'result.csv' WITH CSV HEADER;
>
> Thanks in advance.
>
>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> --

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Extract big data to file

2017-02-08 Thread Justin Cameron
Ideally you would have the program/Spark job that receives the data from
Kafka write it to a text file as it writes each row to Cassandra - that way
you don't need to query Cassandra at all.

If you need to dump this data ad-hoc, rather than on a regular schedule,
your best bet is to write some code to do it as Kiril mentioned. A short
python script would do the job, and you get the added bonus over CQLSH of
being able to throttle the export if it is very large and likely to affect
your cluster's performance.

Alternatively, if *datetimestamp* is part of the table's partition key you
could also use the BEGINTOKEN and ENDTOKEN options of CQLSH's COPY TO
command to achieve what you want.


On Wed, 8 Feb 2017 at 11:40 Kiril Menshikov  wrote:

> Did you try to receive data through the code? cqlsh probably not the right
> tool to fetch 360G.
>
>
>
> On Feb 8, 2017, at 12:34, Cogumelos Maravilha 
> wrote:
>
> Hi list,
>
> My database stores data from Kafka. Using C* 3.0.10
>
> In my cluster I'm using:
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> The result of extract one day of data uncompressed is around 360G.
>
> I've find these approaches:
>
> echo "SELECT kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01';" | cqlsh 100.100.221.146 9042 >
> result.txt
> Here by default I get 100 rows.
>
> Using CAPTURE result.csv with paging off I always get the error out of
> memory. With paging on I need to put something heavy in the top of the
> Enter key. Crazy thing need to enable paging to get ride of out of memory!
> I've take a look to the result file and is empty, perhaps is cooking the
> result in memory to in the end past to disk.
>
> Is there another approach like this on ACID databases:
> copy (select kafka from red where datetimestamp >= '2017-02-02 00:00:00'
> and datetimestamp < '2017-02-02 15:00:01') to 'result.csv' WITH CSV HEADER;
>
> Thanks in advance.
>
>
> --

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Extract big data to file

2017-02-08 Thread Kiril Menshikov
Did you try to receive data through the code? cqlsh probably not the right tool 
to fetch 360G.


> On Feb 8, 2017, at 12:34, Cogumelos Maravilha  
> wrote:
> 
> Hi list,
> 
> My database stores data from Kafka. Using C* 3.0.10
> 
> In my cluster I'm using:
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> 
> The result of extract one day of data uncompressed is around 360G.
> 
> I've find these approaches:
> 
> echo "SELECT kafka from red where datetimestamp >= '2017-02-02 00:00:00' and 
> datetimestamp < '2017-02-02 15:00:01';" | cqlsh 100.100.221.146 9042 > 
> result.txt
> Here by default I get 100 rows.
> 
> Using CAPTURE result.csv with paging off I always get the error out of 
> memory. With paging on I need to put something heavy in the top of the Enter 
> key. Crazy thing need to enable paging to get ride of out of memory! I've 
> take a look to the result file and is empty, perhaps is cooking the result in 
> memory to in the end past to disk.
> 
> Is there another approach like this on ACID databases:
> copy (select kafka from red where datetimestamp >= '2017-02-02 00:00:00' and 
> datetimestamp < '2017-02-02 15:00:01') to 'result.csv' WITH CSV HEADER;
> 
> Thanks in advance.
> 



Composite partition key token

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi,

I would like to ask how to calculate token for composite partition key using 
java api?
For partition key made of one column I use 
cluster.getMetadata().newToken(newBuffer);
But what if my key looks like this PRIMARY KEY ((parentResourceId,timeRT), 
childName)?
I read that “:” is a separator but it doesn’t seem to be the case.
How can I create ByteBuffer with multiple values so that the token would be 
actually correct?

Thank you,
Branislav


Re: Cluster scaling

2017-02-08 Thread Anuj Wadehra
Hi Branislav,
I quickly went through the code and noticed that you are updating RF from code 
and expecting that Cassandra would automatically distribute replicas as per the 
new RF. I think this is not how it works. After updating the RF, you need to 
run repair on all the nodes to make sure that data replicas are as per the new 
RF. Please refer to 
https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html . This 
would give you reliable results.
It would be good if you explain the exact purpose of your exercise. Tests seem 
more in academic interest. You are adding several variables in your tests but 
each of these params have entirely different purpose:
1. Batch/No Batch depends on business atomicity needs. 
2. Read/ No read is dependent on business requirement
3. RF depends on fault tolerance needed

ThanksAnuj
 
 
  On Wed, 8 Feb, 2017 at 9:09 PM, Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) wrote:   
Hi all,
 
 
 
I have a cluster of three nodes and would like to ask some questions about the 
performance.
 
I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.
 
Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.
 
The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),
 
creating usually 100K resources for testing purposes.
 
 
 
The average results (operations per second) with the use of batch statement are:
 
 
 
Replication Factor = 1   with reading    without reading
 
    1-node cluster 37K 46K
 
    2-node cluster     37K 47K
 
    3-node cluster 39K 70K
 
 
 
Replication Factor = 2   with reading    without reading
 
    2-node cluster     21K 40K
 
    3-node cluster 30K 48K
 
 
 
The average results (operations per second) without the use of batch statement 
are:
 
 
 
Replication Factor = 1   with reading    without reading
 
    1-node cluster 31K 20K
 
    2-node cluster     38K 39K
 
    3-node cluster 45K 87K
 
 
 
Replication Factor = 2   with reading    without reading
 
    2-node cluster     19K 22K
 
    3-node cluster 26K 36K
 
 
 
The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.
 
 
 
The code is available herehttps://github.com/bjanosik/CassandraBenchTool.git . 
It can be built with Maven and then you can use jar in target directory 
withjava -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar.
 
Thank you for any help.
 
  
   


Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan
Thanks for the update. Good to know that TWCS give you more stability

On Wed, Feb 8, 2017 at 6:20 PM, John Sanda  wrote:

> I wanted to provide a quick update. I was able to patch one of the
> environments that is hitting the tombstone problem. It has been running
> TWCS for five days now, and things are stable so far. I also had a patch to
> the application code to implement date partitioning ready to go, but I
> wanted to see how things went with only making the compaction changes.
>
> On Sun, Jan 29, 2017 at 4:05 PM, DuyHai Doan  wrote:
>
>> In theory, you're right and Cassandra should possibly skip reading cells
>> having time < 50. But it's all theory, in practice Cassandra read chunks of
>> xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
>> 64k or far less) so you may end up reading tombstones.
>>
>> On Sun, Jan 29, 2017 at 9:24 PM, John Sanda  wrote:
>>
>>> Thanks for the clarification. Let's say I have a partition in an SSTable
>>> where the values of time range from 100 to 10 and everything < 50 is
>>> expired. If I do a query with time < 100 and time >= 50, are there
>>> scenarios in which Cassandra will have to read cells where time < 50? In
>>> particular I am wondering if compression might have any affect.
>>>
>>> On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan 
>>> wrote:
>>>
 "Should the data be sorted by my time column regardless of the
 compaction strategy" --> It does

 What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
 compacted together with a new chunk of SSTABLE-2 containing fresh data so
 in the new resulting SSTable will contain tombstones AND fresh data inside
 the same partition, but of course sorted by clustering column "time".

 On Sun, Jan 29, 2017 at 8:55 PM, John Sanda 
 wrote:

 Since STCS does not sort data based on timestamp, your wide partition
 may span over multiple SSTables and inside each SSTable, old data (+
 tombstones) may sit on the same partition as newer data.


 Should the data be sorted by my time column regardless of the
 compaction strategy? I didn't think that the column timestamp came into
 play with respect to sorting. I have been able to review some SSTables with
 sstablemetadata and I can see that old/expired data is definitely living
 with live data.


 On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan 
 wrote:

 Ok so give it a try with TWCS. Since STCS does not sort data based on
 timestamp, your wide partition may span over multiple SSTables and inside
 each SSTable, old data (+ tombstones) may sit on the same partition as
 newer data.

 When reading by slice, even if you request for fresh data, Cassandra
 has to scan over a lot tombstones to fetch the correct range of data thus
 your issue

 On Sun, Jan 29, 2017 at 8:19 PM, John Sanda 
 wrote:

 It was with STCS. It was on a 2.x version before TWCS was available.

 On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan 
 wrote:

 Did you get this Overwhelming tombstonne behavior with STCS or with
 TWCS ?

 If you're using DTCS, beware of its weird behavior and tricky
 configuration.

 On Sun, Jan 29, 2017 at 3:52 PM, John Sanda 
 wrote:

 Your partitioning key is text. If you have multiple entries per id you
 are likely hitting older cells that have expired. Descending only affects
 how the data is stored on disk, if you have to read the whole partition to
 find whichever time you are querying for you could potentially hit
 tombstones in other SSTables that contain the same "id". As mentioned
 previously, you need to add a time bucket to your partitioning key and
 definitely use DTCS/TWCS.


 As I mentioned previously, the UI only queries recent data, e.g., the
 past hour, past two hours, past day, past week. The UI does not query for
 anything older than the TTL which is 7 days. My understanding and
 expectation was that Cassandra would only scan live cells. The UI is a
 separate application that I do not maintain, so I am not 100% certain about
 the queries. I have been told that it does not query for anything older
 than 7 days.

 On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
 wrote:


 Your partitioning key is text. If you have multiple entries per id you
 are likely hitting older cells that have expired. Descending only affects
 how the data is stored on disk, if you have to read the whole partition to
 find whichever time you are querying for you could potentially hit
 tombstones in other SSTables that contain the same "id". As mentioned

Re: Time series data model and tombstones

2017-02-08 Thread John Sanda
I wanted to provide a quick update. I was able to patch one of the
environments that is hitting the tombstone problem. It has been running
TWCS for five days now, and things are stable so far. I also had a patch to
the application code to implement date partitioning ready to go, but I
wanted to see how things went with only making the compaction changes.

On Sun, Jan 29, 2017 at 4:05 PM, DuyHai Doan  wrote:

> In theory, you're right and Cassandra should possibly skip reading cells
> having time < 50. But it's all theory, in practice Cassandra read chunks of
> xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
> 64k or far less) so you may end up reading tombstones.
>
> On Sun, Jan 29, 2017 at 9:24 PM, John Sanda  wrote:
>
>> Thanks for the clarification. Let's say I have a partition in an SSTable
>> where the values of time range from 100 to 10 and everything < 50 is
>> expired. If I do a query with time < 100 and time >= 50, are there
>> scenarios in which Cassandra will have to read cells where time < 50? In
>> particular I am wondering if compression might have any affect.
>>
>> On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan  wrote:
>>
>>> "Should the data be sorted by my time column regardless of the
>>> compaction strategy" --> It does
>>>
>>> What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
>>> compacted together with a new chunk of SSTABLE-2 containing fresh data so
>>> in the new resulting SSTable will contain tombstones AND fresh data inside
>>> the same partition, but of course sorted by clustering column "time".
>>>
>>> On Sun, Jan 29, 2017 at 8:55 PM, John Sanda 
>>> wrote:
>>>
>>> Since STCS does not sort data based on timestamp, your wide partition
>>> may span over multiple SSTables and inside each SSTable, old data (+
>>> tombstones) may sit on the same partition as newer data.
>>>
>>>
>>> Should the data be sorted by my time column regardless of the compaction
>>> strategy? I didn't think that the column timestamp came into play with
>>> respect to sorting. I have been able to review some SSTables with
>>> sstablemetadata and I can see that old/expired data is definitely living
>>> with live data.
>>>
>>>
>>> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan 
>>> wrote:
>>>
>>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>>> timestamp, your wide partition may span over multiple SSTables and inside
>>> each SSTable, old data (+ tombstones) may sit on the same partition as
>>> newer data.
>>>
>>> When reading by slice, even if you request for fresh data, Cassandra has
>>> to scan over a lot tombstones to fetch the correct range of data thus your
>>> issue
>>>
>>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda 
>>> wrote:
>>>
>>> It was with STCS. It was on a 2.x version before TWCS was available.
>>>
>>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan 
>>> wrote:
>>>
>>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS
>>> ?
>>>
>>> If you're using DTCS, beware of its weird behavior and tricky
>>> configuration.
>>>
>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda 
>>> wrote:
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>> As I mentioned previously, the UI only queries recent data, e.g., the
>>> past hour, past two hours, past day, past week. The UI does not query for
>>> anything older than the TTL which is 7 days. My understanding and
>>> expectation was that Cassandra would only scan live cells. The UI is a
>>> separate application that I do not maintain, so I am not 100% certain about
>>> the queries. I have been told that it does not query for anything older
>>> than 7 days.
>>>
>>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
>>> wrote:
>>>
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: Cluster scaling

2017-02-08 Thread Jan Kesten

Hi Branislav,

what is it you would expect?

Some thoughts:

Batches are often misunderstood, they work well only if they contain 
only one partition key - think of a batch of different sensor data to 
one key. If you group batches with many partition keys and/or do large 
batches this puts high load on the coordinator node with then itself 
needs to talk to the nodes holding the partitions. This could explain 
the scaling you see in your second try without batches. Keep in mind 
that the driver supports executeAsync and ResultSetFutures.


Second, put commitlog and data directories on seperate disks when using 
spindles.


Third, have you monitored iostats and cpustats while running your tests?

Cheers,

Jan

Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC 
at Cisco):


Hi all,

I have a cluster of three nodes and would like to ask some questions 
about the performance.


I wrote a small benchmarking tool in java that mirrors (read, write) 
operations that we do in the real project.


Problem is that it is not scaling like it should. The program runs two 
tests: one using batch statement and one without using the batch.


The operation sequence is: optional select, insert, update, insert. I 
run the tool on my server with 128 threads (# of threads has no 
influence on the performance),


creating usually 100K resources for testing purposes.

The average results (operations per second) with the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K

Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K

The average results (operations per second) without the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K

Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K

The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at 
least 30GB of disk space for each node. Non SSD, each VM is on 
separate physical server.


The code is available here 
https://github.com/bjanosik/CassandraBenchTool.git . It can be built 
with Maven and then you can use jar in target directory with java -jar 
target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .


Thank you for any help.



--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68
enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Cluster scaling

2017-02-08 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi all,



I have a cluster of three nodes and would like to ask some questions about the 
performance.

I wrote a small benchmarking tool in java that mirrors (read, write) operations 
that we do in the real project.

Problem is that it is not scaling like it should. The program runs two tests: 
one using batch statement and one without using the batch.

The operation sequence is: optional select, insert, update, insert. I run the 
tool on my server with 128 threads (# of threads has no influence on the 
performance),

creating usually 100K resources for testing purposes.



The average results (operations per second) with the use of batch statement are:



Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K



Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K



The average results (operations per second) without the use of batch statement 
are:



Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K



Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K



The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at least 30GB 
of disk space for each node. Non SSD, each VM is on separate physical server.



The code is available here https://github.com/bjanosik/CassandraBenchTool.git . 
It can be built with Maven and then you can use jar in target directory with 
java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .

Thank you for any help.



Re: Questions about TWCS

2017-02-08 Thread Alain RODRIGUEZ
Hi John,

I will try to answer you on those questions, relying on people around to
correct me if I am wrong.

It says to align partition keys to your TWCS windows. Is it generally the
> case that calendar/date based partitions would align nicely with TWCS
> windows such that we would end up with one SSTable per partition after the
> major compaction runs?


So this is probably meant to avoid having data spread on many buckets
(possibly all of them) as it makes tombstone eviction harder, and depending
on your queries the read might be way longer as it could hit many SSTables,
for example if you use a LIMIT clause, without filtering on the clustering
key (Alex who wrote the post you mentioned is currently writing about this
kind of reads using limits). Well, keep in mind that in many case it is way
better to have changing partitions over time.

About how nicely the partitions would align to TWCS buckets, I guess it is
just about adding a time period as a part of the partition key:

Using an hour window? What about adding mmddHH as part of the partition
key, the key would look like ((item1, 2017020814), date), '(item1,
2017020814)' being the partition key and 'date' a clustering key. This is
just a stupid example to give you an idea on how to control partition size
and time range they cover. On the flip side, to select a full day, you
would then need to query 24 partitions (1 per hour).

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html says to aim for <
> 50 buckets per table based on TTL.


This, as many numbers shared around about Cassandra, is probably to give
people a rough idea of what this number should be, the order of magnitude.
You can probably be fine with 20 or 100 SSTables. Sharing experiences in
Cassandra is not easy as how efficient each setting will be depends on the
hardware, workload and many other things, and so be different from one
cluster to the next one. Plus if I remember correctly Jeff said to use ~ 30
buckets.

Are there any recommendations on a range to stay within for number of
> buckets?


Well, this is actually quite straightforward. As a premise we have known
and fixed TTL and we consider the final state of each bucket, meaning we
consider 1 bucket = 1 sstable. Then to choose the appropriate window
(bucket) size, it is just about dividing the TTL per the desired number of
sstables (30). For a 90 days TTL, use 90 / 30 = 3 days.

What are some of the tradeoffs of smaller vs larger number of buckets? For
> example I know that a smaller number of buckets means more SSTable to
> compact during the major compaction that runs when we get past a given
> window.


Alex wrote about this, just before recommending the 50 SSTable max. But to
answer again quickly, the bigger the buckets are, the heavier the
compactions will be indeed. On the other side, more SSTables means that
Cassandra will have to read many SSTables to have an information in some
cases, even if relevant data is hold on one of those. Each SSTable read is
a disk read, known to be way slower than other things in computer science
as of now, even if this improves with SSDs, it is still a major thing to
keep in mind.

Are tombstone compactions disabled by default?


No, they are not. Default options as far as I remember are to trigger
compaction on SSTables having a droppable tombstone ratio over *0.20
*(tombstone_threshold),
*if* no tombstone compaction ran in the last *1 day*
(tombstone_compaction_interval)
and if Cassandra assumes there is not too much overlapping with other
SSTables. To make tombstone compaction more aggressive (removing the check
that I just mentioned) set "unchecked_tombstone_compaction" to true.

More information:
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Can you ever wind up in a situation where the major compaction that is
> supposed to run at the end of a window does not run? Not sure if this is
> realistic but consider this scenario


This is possible if compactions are not keeping up, because allocated
resources are not big enough for example. Also, when adding an out of order
SSTable, the old bucket is falling behind on compactions as a compaction is
then needed.

Suppose compaction falls behind such that there are 5 windows for which the
> major compactions have not run. Will TWCS run the major compactions for
> those window serially oldest to newest?


I believe it is in reverse order, Newest first, then going to older
buckets. That's what would make sense to me, I did not check the code
honestly. Maybe Jeff, Alex or someone else will be able to confirm that.

If am I using a window size of one day, it it current 02:00 AM Tuesday, and
> I receive a write for 11:45 PM Monday, should I consider that out of order?


Yes, that's what being out of order means in our case I guess. The thing to
keep in mind is that this data will be flushed with other data from 2 am,
meaning the max 

FINAL REMINDER: CFP for ApacheCon closes February 11th

2017-02-08 Thread Rich Bowen
Dear Apache Enthusiast,

This is your FINAL reminder that the Call for Papers (CFP) for ApacheCon
Miami is closing this weekend - February 11th. This is your final
opportunity to submit a talk for consideration at this event.

This year, we are running several mini conferences in conjunction with
the main event, so if you're submitting for one of those events, please
pay attention to the instructions below.

Apache: Big Data
* Event information:
http://events.linuxfoundation.org/events/apache-big-data-north-america
* CFP:
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp

Apache: IoT (Internet of Things)
* Event Information: http://us.apacheiot.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'IoT' in the Target Audience field)

CloudStack Collaboration Conference
* Event information: http://us.cloudstackcollab.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'CloudStack' in the Target Audience field)

FlexJS Summit
* Event information - http://us.apacheflexjs.org/
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'Flex' in the Target Audience field)

TomcatCon
* Event information - https://tomcat.apache.org/conference.html
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
(Indicate 'Tomcat' in the Target Audience field)

All other topics and projects
* Event information -
http://events.linuxfoundation.org/events/apachecon-north-america/program/about
* CFP -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp

Admission to any of these events also grants you access to all of the
others.

Thanks, and we look forward to seeing you in Miami!

-- 
Rich Bowen
VP Conferences, Apache Software Foundation
rbo...@apache.org
Twitter: @apachecon



(You are receiving this email because you are subscribed to a dev@ or
users@ list of some Apache Software Foundation project. If you do not
wish to receive email from these lists any more, you must follow that
list's unsubscription procedure. View the headers of this message for
unsubscription instructions.)


Extract big data to file

2017-02-08 Thread Cogumelos Maravilha
Hi list,

My database stores data from Kafka. Using C* 3.0.10

In my cluster I'm using:
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}

The result of extract one day of data uncompressed is around 360G.

I've find these approaches:

echo "SELECT kafka from red where datetimestamp >= '2017-02-02 00:00:00'
and datetimestamp < '2017-02-02 15:00:01';" | cqlsh 100.100.221.146 9042
> result.txt
Here by default I get 100 rows.

Using CAPTURE result.csv with paging off I always get the error out of
memory. With paging on I need to put something heavy in the top of the
Enter key. Crazy thing need to enable paging to get ride of out of
memory! I've take a look to the result file and is empty, perhaps is
cooking the result in memory to in the end past to disk.

Is there another approach like this on ACID databases:
copy (select kafka from red where datetimestamp >= '2017-02-02 00:00:00'
and datetimestamp < '2017-02-02 15:00:01') to 'result.csv' WITH CSV HEADER;

Thanks in advance.



Re: Authentication with Java driver

2017-02-08 Thread Yuji Ito
Thanks Ben,

Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?


Lots of instances of the process are generated.
I wanted to confirm that `other` doesn't authenticate.

If I want to avoid that, my application has to create new cluster/session
objects per instance.
But it is inefficient and uncommon.
So, we aren't sure that the application works when a lot of cluster/session
objects are created.
Is it correct?

Thank you,
Yuji



On Wed, Feb 8, 2017 at 12:01 PM, Ben Bromhead  wrote:

> On Tue, 7 Feb 2017 at 17:52 Yuji Ito  wrote:
>
> Thanks Andrew, Ben,
>
> My application creates a lot of instances connecting to Cassandra with
> basically the same set of credentials.
>
> Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?
>
>
> After an instance connects to Cassandra with the credentials, can any
> instance connect to Cassandra without credentials?
>
> As long as you don't share the session or cluster objects. Each new
> cluster/session will need to reauthenticate.
>
>
> == example ==
> A first = new A("database", "user", "password");  // proper credentials
> r = first.get();
> ...
> A other = new A("database", "user", "pass"); // wrong password
> r = other.get();
> == example ==
>
> I want to refuse the `other` instance with improper credentials.
>
>
> This looks like you are creating new cluster/session objects (filling in
> the blanks for your pseudocode here). So "other" will not authenticate to
> Cassandra.
>
> This brings up a wider point of why you are doing this? Generally most
> applications will create a single longed lived session object that lasts
> the life of the application process.
>
> I would not rely on Cassandra auth to authenticate downstream actors, not
> because it's bad, just its generally inefficient to create lots of session
> objects. The session object maintains a connection pool, pipelines
> requests, is thread safe and generally pretty solid.
>
>
>
>
> Yuji
>
>
> On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead  wrote:
>
> What are you specifically trying to achieve? Are you trying to
> authenticate multiple Cassandra users from a single application instance?
> Or will your have lot's of application instances connecting to Cassandra
> using the same set of credentials? Or a combination of both? Multiple
> application instances with different credentials?
>
> On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert 
> wrote:
>
> Hello,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
>
> With the datastax driver, Session is what manages connection pools to
> each node.  Cluster manages configuration and a separate connection
> ('control connection') to subscribe to state changes (schema changes, node
> topology changes, node up/down events).
>
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
>
>
> I'm unsure how common it is for per-user authentication to be done when
> connecting to the database.  I think an application would normally
> authenticate with one set of credentials instead of multiple.  The protocol
> Cassandra uses does authentication at the connection level instead of at
> the request level, so that is currently a limitation to support something
> like reusing Sessions for authenticating multiple users.
>
> Thanks,
> Andy
>
>
> On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada  wrote:
>
> Hi,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
> 1000 clients seems usual if there are many nodes (say 20) and each
> node has some concurrency (say 50),
> but 1000 cluster instances seems too many.
>
> Is this an expected way to do this ? or
> Is there any way to authenticate per session ?
>
> Thanks,
> Hiro
>
> On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito  wrote:
> > Hi all,
> >
> > I want to know how to authenticate Cassandra users for multiple instances
> > with Java driver.
> > For instance, each thread creates a instance to access Cassandra with
> > authentication.
> >
> > As the implementation example, only the first constructor builds a
> cluster
> > and a session.
> > Other constructors use them.
> > This example is implemented according to the datastax document:
> "Basically
> > you will want to share the same cluster and session instances across your
> > application".
> > http://www.datastax.com/dev/blog/4-simple-rules-when-
> using-the-datastax-drivers-for-cassandra
> >
> > However, other constructors don't authenticate the user and the password.
> > That's because they don't need to build a cluster and a session.
> >
> > So, should I