New node is stuck in JOINING state

2023-04-05 Thread Eunsu Kim
Hi, all

I recently encountered this behavior when adding new nodes to my Apache 
Cassandra 4.1.0 cluster. 
When I checked the system.log of the new added node, I found the following logs 
being logged repeatedly.

--
WARN  [OptionalTasks:1] 2023-04-05 18:50:26,722 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:50:26,722 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:50:36,731 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:50:36,731 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
INFO  [OptionalTasks:1] 2023-04-05 18:50:46,736 NoSpamLogger.java:105 - "Cannot 
read from a bootstrapping node" while executing SELECT * FROM system_auth.roles 
WHERE role = 'cassandra' ALLOW FILTERING
WARN  [OptionalTasks:1] 2023-04-05 18:50:46,736 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:50:46,736 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:50:56,745 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:50:56,745 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:51:06,749 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:51:06,749 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:51:16,750 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:51:16,750 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:51:26,754 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:51:26,754 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:51:36,763 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:51:36,763 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
INFO  [OptionalTasks:1] 2023-04-05 18:51:46,768 NoSpamLogger.java:105 - "Cannot 
read from a bootstrapping node" while executing SELECT * FROM system_auth.roles 
WHERE role = 'cassandra' ALLOW FILTERING
WARN  [OptionalTasks:1] 2023-04-05 18:51:46,768 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] 2023-04-05 18:51:46,768 CassandraRoleManager.java:395 - 
Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] 2023-04-05 18:51:56,777 CassandraRoleManager.java:359 - 
CassandraRoleManager skipped default role setup: some nodes were not ready
--


When I restarted the process on this node, it went through the JOINING process 
once again and ended up in the UN state.

I would appreciate your advice.

Best regards.

Change the compression algorithm on a production table at runtime

2022-09-20 Thread Eunsu Kim
Hi all

According to 
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlAlterTable.html 
,
 it can be very problematic to modify the Compaction strategy on a table 
running in production.

Similarly, is it risky to change the compression algorithm to existing table in 
production?

Currently the table is using DeflateCompressor but I want to change it to 
LZ4Compressor(for performance) Cassandra version is 3.11.10.


Thank you in advance.

Re: Using zstd compression on Cassandra 3.x

2022-09-12 Thread Eunsu Kim
Thank you for your response.

I'll consider upgrading to 4.x.


> 2022. 9. 13. 오후 2:41, Dinesh Joshi  작성:
> 
> Is there something preventing you from upgrading to 4.0? It is backward 
> compatible with 3.0 so clients don’t need to change.
> 
> If you don’t want to absolutely upgrade you can extract the implementation 
> from 4.0 and use it. I would advise against this path though as zstd 
> implementation is nuanced.
> 
> Dinesh
> 
>> On Sep 12, 2022, at 7:09 PM, Eunsu Kim  wrote:
>> 
>> Hi all,
>> 
>> Since zstd compression is a very good compression algorithm, it is available 
>> in Cassandra 4.0. Because the overall performance and ratio are excellent
>> 
>> There is open source available for Cassandra 3.x.
>> https://github.com/MatejTymes/cassandra-zstd
>> 
>> Do you have any experience applying this to production?
>> 
>> I want to improve performance and disk usage by applying it to a running 
>> Cassandra cluster.
>> 
>> Thanks.



Using zstd compression on Cassandra 3.x

2022-09-12 Thread Eunsu Kim
Hi all,

Since zstd compression is a very good compression algorithm, it is available in 
Cassandra 4.0. Because the overall performance and ratio are excellent

There is open source available for Cassandra 3.x.
https://github.com/MatejTymes/cassandra-zstd

Do you have any experience applying this to production?

I want to improve performance and disk usage by applying it to a running 
Cassandra cluster.

Thanks.

Re: about memory problem in write heavy system..

2022-01-10 Thread Eunsu Kim
Thank you Bowen.

As can be seen from the chart, the memory of existing nodes has increased since 
new nodes were added. And I stopped writing a specific table. Write throughput 
decreased by about 15%. And memory usage began to decrease.
I'm not sure if this was done by natural resolution or by reducing writing.
What is certain is that the addition of new nodes has increased the native 
memory usage of some existing nodes.

After reading the 3.x to 4.x migration guide of DataStax, it seems that more 
than 50% of disk availability is required for upgrade. This is likely to be a 
major obstacle to upgrading the cluster in operation.


Many thanks.

> 2022. 1. 10. 오후 8:53, Bowen Song  작성:
> 
> Anything special about the table you stopped writing to? I'm wondering how 
> did you locate the table was the cause of the memory usage increase.
> 
> > For the latest version (3.11.11) upgrade, can the two versions coexist in 
> > the cluster for a while?
> > 
> > Can the 4.x version coexist as well?
> 
> Yes and yes. It is expected that two different versions of Cassandra will be 
> running in the same cluster at the same time while upgrading. This process is 
> often called zero downtime upgrade or rolling upgrade. You can perform such 
> upgrade from 3.11.4 to 3.11.11 or directly to 4.0.1, both are supported. 
> Surprisingly, I can't find any documentation related to this on the 
> cassandra.apache.org website (if you found it, please send me a link). Some 
> other sites have brief guides on this process, such as DataStax 
> <https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x#how-the-migration-works>
>  and Instaclustr 
> <https://www.instaclustr.com/support/documentation/cassandra/cassandra-cluster-operations/cassandra-version-upgrades/>,
>  and you should always read the release notes 
> <https://github.com/apache/cassandra/blob/trunk/NEWS.txt> which includes 
> breaking changes and new features before you perform an upgrade.
> 
> 
> 
> On 10/01/2022 00:18, Eunsu Kim wrote:
>> Thank you for your response
>> 
>> Fortunately, memory usage came back down over the weekend. I removed the 
>> writing of a specific table last Friday.
>> 
>> <붙여넣은 그래픽-2.png>
>> 
>> 
>> For the latest version (3.11.11) upgrade, can the two versions coexist in 
>> the cluster for a while?
>> 
>> Can the 4.x version coexist as well?
>> 
>>> 2022. 1. 8. 오전 1:26, Jeff Jirsa >> <mailto:jji...@gmail.com>> 작성:
>>> 
>>> 3.11.4 is a very old release, with lots of known bugs. It's possible the 
>>> memory is related to that.
>>> 
>>> If you bounce one of the old nodes, where does the memory end up? 
>>> 
>>> 
>>> On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim >> <mailto:eunsu.bil...@gmail.com>> wrote:
>>> 
>>> Looking at the memory usage chart, it seems that the physical memory usage 
>>> of the existing node has increased since the new node was added with 
>>> auto_bootstrap=false.
>>> 
>>> <붙여넣은 그래픽-1.png>
>>> 
>>> 
>>>> 
>>>> On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim >>> <mailto:eunsu.bil...@gmail.com>> wrote:
>>>> Hi,
>>>> 
>>>> I have a Cassandra cluster(3.11.4) that does heavy writing work. (14k~16k 
>>>> write throughput per second per node)
>>>> 
>>>> Nodes are physical machine in data center. Number of nodes are 30. Each 
>>>> node has three data disks mounted.
>>>> 
>>>> 
>>>> A few days ago, a QueryTimeout problem occurred due to Full GC.
>>>> So, referring to this 
>>>> blog(https://thelastpickle.com/blog/2018/04/11/gc-tuning.html 
>>>> <https://thelastpickle.com/blog/2018/04/11/gc-tuning.html>), it seemed to 
>>>> have been solved by changing the memtable_allocation_type to 
>>>> offheap_objects.
>>>> 
>>>> But today, I got an alarm saying that some nodes are using more than 90% 
>>>> of physical memory. (115GiB /125GiB)
>>>> 
>>>> Native memory usage of some nodes is gradually increasing.
>>>> 
>>>> 
>>>> 
>>>> All tables use TWCS, and TTL is 2 weeks.
>>>> 
>>>> Below is the applied jvm option.
>>>> 
>>>> -Xms31g
>>>> -Xmx31g
>>>> -XX:+UseG1GC
>>>> -XX:G1RSetUpdatingPauseTimePercent=5
>>>> -XX:MaxGCPauseMillis=500
>>>> -XX:InitiatingHeapOccupancyPercent=70
>>>> -XX:ParallelGCThreads=24
>>>> -XX:ConcGCThreads=24
>>>> …
>>>> 
>>>> 
>>>> What additional things can I try?
>>>> 
>>>> I am looking forward to the advice of experts.
>>>> 
>>>> Regards.
>>> 
>> 



Re: about memory problem in write heavy system..

2022-01-06 Thread Eunsu Kim

Looking at the memory usage chart, it seems that the physical memory usage of 
the existing node has increased since the new node was added with 
auto_bootstrap=false.




> 
> On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim  <mailto:eunsu.bil...@gmail.com>> wrote:
> Hi,
> 
> I have a Cassandra cluster(3.11.4) that does heavy writing work. (14k~16k 
> write throughput per second per node)
> 
> Nodes are physical machine in data center. Number of nodes are 30. Each node 
> has three data disks mounted.
> 
> 
> A few days ago, a QueryTimeout problem occurred due to Full GC.
> So, referring to this 
> blog(https://thelastpickle.com/blog/2018/04/11/gc-tuning.html 
> <https://thelastpickle.com/blog/2018/04/11/gc-tuning.html>), it seemed to 
> have been solved by changing the memtable_allocation_type to offheap_objects.
> 
> But today, I got an alarm saying that some nodes are using more than 90% of 
> physical memory. (115GiB /125GiB)
> 
> Native memory usage of some nodes is gradually increasing.
> 
> 
> 
> All tables use TWCS, and TTL is 2 weeks.
> 
> Below is the applied jvm option.
> 
> -Xms31g
> -Xmx31g
> -XX:+UseG1GC
> -XX:G1RSetUpdatingPauseTimePercent=5
> -XX:MaxGCPauseMillis=500
> -XX:InitiatingHeapOccupancyPercent=70
> -XX:ParallelGCThreads=24
> -XX:ConcGCThreads=24
> …
> 
> 
> What additional things can I try?
> 
> I am looking forward to the advice of experts.
> 
> Regards.



about memory problem in write heavy system..

2022-01-06 Thread Eunsu Kim
Hi,

I have a Cassandra cluster(3.11.4) that does heavy writing work. (14k~16k write 
throughput per second per node)

Nodes are physical machine in data center. Number of nodes are 30. Each node 
has three data disks mounted.


A few days ago, a QueryTimeout problem occurred due to Full GC.
So, referring to this 
blog(https://thelastpickle.com/blog/2018/04/11/gc-tuning.html 
), it seemed to have 
been solved by changing the memtable_allocation_type to offheap_objects.

But today, I got an alarm saying that some nodes are using more than 90% of 
physical memory. (115GiB /125GiB)

Native memory usage of some nodes is gradually increasing.



All tables use TWCS, and TTL is 2 weeks.

Below is the applied jvm option.

-Xms31g
-Xmx31g
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=24
-XX:ConcGCThreads=24
…


What additional things can I try?

I am looking forward to the advice of experts.

Regards.

remove dead node without streaming

2021-03-25 Thread Eunsu Kim
Hi all,

Is it possible to remove dead node directly from the cluster without streaming?

My Cassandra cluster is quite large and takes too long to stream. (nodetool 
removenode)

It's okay if my data is temporarily inconsistent.

Thanks in advance.

Re: various TTL datas in one table (TWCS)

2020-10-28 Thread Eunsu Kim
Thank you for your response.

What subproperties do you mean specifically?

Currently, there are the following settings to ageressive purge.

AND COMPACTION = { 'class' : 'TimeWindowCompactionStrategy', 
'compaction_window_unit' : 'HOURS', 'compaction_window_size' : 12, 
'unchecked_tombstone_compaction': true, 'tombstone_threshold' : 0.05, 
'tombstone_compaction_interval' : 21600 }
AND gc_grace_seconds = 600

Apache Cassandra Version 3.11.4


> 2020. 10. 29. 12:26, Jeff Jirsa  작성:
> 
> Works but requires you to enable tombstone compaction subproperties  if you 
> need to purge the 2w ttl data before the highest ttl time you chose
> 
>> On Oct 28, 2020, at 5:58 PM, Eunsu Kim  wrote:
>> 
>> Hello,
>> 
>> I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on 
>> the recommendation of experts. This table is quite big, high WPS.
>> 
>> I would like to insert data different TTL from the default in this table 
>> according to the type of data.
>> About four different TTLs (4w, 6w, 8w, 10w)
>> 
>> ex.)
>> INSERT INTO my_table (…..) VALUES (….) USING TTL 4w
>> 
>> 
>> Could this cause performance problems or unexpected problems in the 
>> compaction?
>> 
>> Please give me advice,
>> 
>> Thank you.
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 



various TTL datas in one table (TWCS)

2020-10-28 Thread Eunsu Kim
Hello,

I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on the 
recommendation of experts. This table is quite big, high WPS.

I would like to insert data different TTL from the default in this table 
according to the type of data.
About four different TTLs (4w, 6w, 8w, 10w)

ex.)
INSERT INTO my_table (…..) VALUES (….) USING TTL 4w


Could this cause performance problems or unexpected problems in the compaction?

Please give me advice,

Thank you.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



new node stops streaming..

2020-01-27 Thread Eunsu Kim
Hi experts

I had a problem adding a new node.

Joining node in datacenterA stops streaming while joining. So it keeps the UJ.
(datacenterB is fine.)

I try 'nodetool netstats' on a stopped node and it looks like this:

Mode: JOINING
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0

When I try 'nodetool rebuild' it changes to the following but no steaming 
occurs.

Mode: JOINING
Rebuild 1df64590-4166-11ea-86a0-4b3cc5e92e4a
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0

I think this is related to the number of open file descriptors.



Incoming Streaming Bytes went to zero after the number of open file descriptors 
reached the host's MAX (65536).
Since then, the number of open file descriptors has decreased, but steaming has 
not resumed.

And when I drop that joining process, it automatically was removed from the 
cluster.

What should I do to add nodes to this data center in this case?

Please advice.

Thank you.

Curiosity in adding nodes

2019-10-21 Thread Eunsu Kim
Hi experts,

When a new node was added, how can the coordinator find data that has been not 
yet streamed?

Or is new nodes not used until all data is streamed?

Thanks in advance
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: What happens if my Cassandra cluster's certificate expires?

2019-09-25 Thread Eunsu Kim
Laxmikant, Thank you for your quick response.


From: Laxmikant Upadhyay 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, 25 September 2019 at 7:42 PM
To: "user@cassandra.apache.org" 
Subject: Re: What happens if my Cassandra cluster's certificate expires?

Your new connection and re-connection (in case of service restart) will not 
establish but requests on existing connections should work.

On Wed, Sep 25, 2019 at 3:19 PM Eunsu Kim 
mailto:eunsu.bil...@gmail.com>> wrote:
Hi all

I recently enabled client_encryption_options on a cassandra.yaml

client_encryption_options:
enabled: true
optional: true
keystore: conf/my-keystore.jks
keystore_password: password
require_client_auth: false


What happens if the certificate expires while in operation?

Will nothing happen? Or will the channel be closed?

Please share your experience.

Thank you.


--

regards,
Laxmikant Upadhyay



What happens if my Cassandra cluster's certificate expires?

2019-09-25 Thread Eunsu Kim
Hi all

I recently enabled client_encryption_options on a cassandra.yaml

client_encryption_options:
enabled: true
optional: true
keystore: conf/my-keystore.jks
keystore_password: password
require_client_auth: false


What happens if the certificate expires while in operation?

Will nothing happen? Or will the channel be closed?

Please share your experience.

Thank you.


Re: about remaining data after adding a node

2019-09-05 Thread Eunsu Kim
Thank you for your response.

I’m using TimeWindowCompactionStrategy.

So if I don't run nodetool compact, will the remaining data not be deleted?

From: Federico Razzoli 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, 5 September 2019 at 6:19 PM
To: "user@cassandra.apache.org" 
Subject: Re: about remaining data after adding a node

Hi Eunsu,

Are you using DateTieredCompactionStrategy? It optimises the deletion of 
expired data from disks.
If minor compactions are not solving the problem, I suggest to run nodetool 
compact.

Federico


On Thu, 5 Sep 2019 at 09:51, Eunsu Kim 
mailto:eunsu.bil...@gmail.com>> wrote:

Hi, all


After adding a new node, all the data was streamed by the newly allocated token.


Since nodetool cleanup has not yet been performed on existing nodes, the total 
size has increased.


All data has a short ttl. In this case, will the data remaining on the existing 
node be deleted after the end of life? Or should I run nodetool cleanup to 
delete it?


Thanks in advance.


about remaining data after adding a node

2019-09-05 Thread Eunsu Kim

Hi, all


After adding a new node, all the data was streamed by the newly allocated token.


Since nodetool cleanup has not yet been performed on existing nodes, the total 
size has increased.


All data has a short ttl. In this case, will the data remaining on the existing 
node be deleted after the end of life? Or should I run nodetool cleanup to 
delete it?


Thanks in advance.


Re: Data growth is abnormal

2018-12-26 Thread Eunsu Kim
I solved this problem with a sub-properties of compaction. 
(unchecked_tombstone_compaction, tombstone_threshold, 
tombstone_compaction_interval)

It took time. Eventually, two datacenters were again balanced.

Thank you.

> On 24 Dec 2018, at 3:48 PM, Eunsu Kim  wrote:
> 
> Oh I’m sorry.
> It is marked as included in 3.11.1.
> It seems to be confused with other comments in the middle.
> However, I am not sure what to do with this page..
> 
>> On 24 Dec 2018, at 3:35 PM, Eunsu Kim > <mailto:eunsu.bil...@gmail.com>> wrote:
>> 
>> Thank you for your response.
>> 
>> The patch for the issue page you linked to may be not included in 3.11.3.
>> 
>> If I run repair -pr on all nodes, will both datacenter use the same amount 
>> of disk?
>> 
>>> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa >> <mailto:jji...@gmail.com>> wrote:
>>> 
>>> Seems like this is getting asked more and more, that’s unfortunate. Wish I 
>>> had time to fix this by making flush smarter or TWCS split old data. But I 
>>> don’t. 
>>> 
>>> You can search the list archives for more examples, but what’s probably 
>>> happening is that you have sstables overlapping which prevents TWCS from 
>>> dropping them when fully expired
>>> 
>>> The overlaps probably come from either probabilistic read repair or 
>>> speculative retry read-repairing data into the memtable on the dc that 
>>> coordinates your reads
>>> 
>>> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you 
>>> can force sstables to be dropped at expiration regardless of overlaps, but 
>>> you have to set some properties because it’s technically unsafe (if you 
>>> write to the table with anything other than ttls).
>>> 
>>> 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>> On Dec 24, 2018, at 12:05 AM, Eunsu Kim >> <mailto:eunsu.bil...@gmail.com>> wrote:
>>> 
>>>> I’m using TimeWindowCompactionStrategy.
>>>> 
>>>> All consistency level is ONE.
>>>> 
>>>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa >>>> <mailto:jji...@gmail.com>> wrote:
>>>>> 
>>>>> What compaction strategy are you using ?
>>>>> 
>>>>> What consistency level do you use on writes? Reads? 
>>>>> 
>>>>> -- 
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim >>>>> <mailto:eunsu.bil...@gmail.com>> wrote:
>>>>>> 
>>>>>> Merry Christmas
>>>>>> 
>>>>>> The Cassandra cluster I operate consists of two datacenters.
>>>>>> 
>>>>>> Most data has a TTL of 14 days and stores one data for each data center. 
>>>>>> (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>>>> 
>>>>>> However, for a few days ago, only the datacenter1 disk usage is 
>>>>>> increasing rapidly.
>>>>>> 
>>>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>>>> 
>>>>>> How does this happen? What can I do?
>>>>>> 
>>>>>> I would appreciate your advice.
>>>>>> 
>>>>>> Thank you in advance.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>>>> <mailto:user-unsubscr...@cassandra.apache.org>
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>>>> <mailto:user-h...@cassandra.apache.org>
>>>>> 
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>>> <mailto:user-unsubscr...@cassandra.apache.org>
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>>> <mailto:user-h...@cassandra.apache.org>
>>>> 
>> 
> 



Re: Data growth is abnormal

2018-12-23 Thread Eunsu Kim
Oh I’m sorry.
It is marked as included in 3.11.1.
It seems to be confused with other comments in the middle.
However, I am not sure what to do with this page..

> On 24 Dec 2018, at 3:35 PM, Eunsu Kim  wrote:
> 
> Thank you for your response.
> 
> The patch for the issue page you linked to may be not included in 3.11.3.
> 
> If I run repair -pr on all nodes, will both datacenter use the same amount of 
> disk?
> 
>> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa > <mailto:jji...@gmail.com>> wrote:
>> 
>> Seems like this is getting asked more and more, that’s unfortunate. Wish I 
>> had time to fix this by making flush smarter or TWCS split old data. But I 
>> don’t. 
>> 
>> You can search the list archives for more examples, but what’s probably 
>> happening is that you have sstables overlapping which prevents TWCS from 
>> dropping them when fully expired
>> 
>> The overlaps probably come from either probabilistic read repair or 
>> speculative retry read-repairing data into the memtable on the dc that 
>> coordinates your reads
>> 
>> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 
>> <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you 
>> can force sstables to be dropped at expiration regardless of overlaps, but 
>> you have to set some properties because it’s technically unsafe (if you 
>> write to the table with anything other than ttls).
>> 
>> 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>> On Dec 24, 2018, at 12:05 AM, Eunsu Kim > <mailto:eunsu.bil...@gmail.com>> wrote:
>> 
>>> I’m using TimeWindowCompactionStrategy.
>>> 
>>> All consistency level is ONE.
>>> 
>>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa >>> <mailto:jji...@gmail.com>> wrote:
>>>> 
>>>> What compaction strategy are you using ?
>>>> 
>>>> What consistency level do you use on writes? Reads? 
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim >>>> <mailto:eunsu.bil...@gmail.com>> wrote:
>>>>> 
>>>>> Merry Christmas
>>>>> 
>>>>> The Cassandra cluster I operate consists of two datacenters.
>>>>> 
>>>>> Most data has a TTL of 14 days and stores one data for each data center. 
>>>>> (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>>> 
>>>>> However, for a few days ago, only the datacenter1 disk usage is 
>>>>> increasing rapidly.
>>>>> 
>>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>>> 
>>>>> How does this happen? What can I do?
>>>>> 
>>>>> I would appreciate your advice.
>>>>> 
>>>>> Thank you in advance.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>>> <mailto:user-unsubscr...@cassandra.apache.org>
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>>> <mailto:user-h...@cassandra.apache.org>
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>> <mailto:user-unsubscr...@cassandra.apache.org>
>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>> <mailto:user-h...@cassandra.apache.org>
>>> 
> 



Re: Data growth is abnormal

2018-12-23 Thread Eunsu Kim
Thank you for your response.

The patch for the issue page you linked to may be not included in 3.11.3.

If I run repair -pr on all nodes, will both datacenter use the same amount of 
disk?

> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa  wrote:
> 
> Seems like this is getting asked more and more, that’s unfortunate. Wish I 
> had time to fix this by making flush smarter or TWCS split old data. But I 
> don’t. 
> 
> You can search the list archives for more examples, but what’s probably 
> happening is that you have sstables overlapping which prevents TWCS from 
> dropping them when fully expired
> 
> The overlaps probably come from either probabilistic read repair or 
> speculative retry read-repairing data into the memtable on the dc that 
> coordinates your reads
> 
> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 
> <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you can 
> force sstables to be dropped at expiration regardless of overlaps, but you 
> have to set some properties because it’s technically unsafe (if you write to 
> the table with anything other than ttls).
> 
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Dec 24, 2018, at 12:05 AM, Eunsu Kim  <mailto:eunsu.bil...@gmail.com>> wrote:
> 
>> I’m using TimeWindowCompactionStrategy.
>> 
>> All consistency level is ONE.
>> 
>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa >> <mailto:jji...@gmail.com>> wrote:
>>> 
>>> What compaction strategy are you using ?
>>> 
>>> What consistency level do you use on writes? Reads? 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim >>> <mailto:eunsu.bil...@gmail.com>> wrote:
>>>> 
>>>> Merry Christmas
>>>> 
>>>> The Cassandra cluster I operate consists of two datacenters.
>>>> 
>>>> Most data has a TTL of 14 days and stores one data for each data center. 
>>>> (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>> 
>>>> However, for a few days ago, only the datacenter1 disk usage is increasing 
>>>> rapidly.
>>>> 
>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>> 
>>>> How does this happen? What can I do?
>>>> 
>>>> I would appreciate your advice.
>>>> 
>>>> Thank you in advance.
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>> <mailto:user-unsubscr...@cassandra.apache.org>
>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>> <mailto:user-h...@cassandra.apache.org>
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> <mailto:user-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> <mailto:user-h...@cassandra.apache.org>
>> 



Re: Data growth is abnormal

2018-12-23 Thread Eunsu Kim
I’m using TimeWindowCompactionStrategy.

All consistency level is ONE.

> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa  wrote:
> 
> What compaction strategy are you using ?
> 
> What consistency level do you use on writes? Reads? 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim  wrote:
>> 
>> Merry Christmas
>> 
>> The Cassandra cluster I operate consists of two datacenters.
>> 
>> Most data has a TTL of 14 days and stores one data for each data center. 
>> (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>> 
>> However, for a few days ago, only the datacenter1 disk usage is increasing 
>> rapidly.
>> 
>> There is no change in nodetool cleanup on each node of datacenter1.
>> 
>> How does this happen? What can I do?
>> 
>> I would appreciate your advice.
>> 
>> Thank you in advance.
>> 
>> 
>> 
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Data growth is abnormal

2018-12-23 Thread Eunsu Kim
Merry Christmas

The Cassandra cluster(3.11.3) I operate consists of two datacenters.

Most data has a TTL of 14 days and stores one data for each data center. 
(NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)

However, for a few days ago, only the datacenter1 disk usage is increasing 
rapidly. (current 9.3TB vs 7.7TB)

There is no change in nodetool cleanup on each node of datacenter1.

How does this happen? What can I do?

I would appreciate your advice.

Thank you in advance.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Data storage space unbalance issue

2018-12-04 Thread Eunsu Kim
Thanks to you, I found a good tool called Reaper.(http://cassandra-reaper.io/ 
<http://cassandra-reaper.io/>)

I will try it.


> On 4 Dec 2018, at 3:30 PM, Elliott Sims  wrote:
> 
> It depends on the type of repair, but you'll want to make sure all the data 
> is where it should be before running cleanup.  Somewhat related, if you're 
> not running regular repairs already, you should be.  You can do it via cron, 
> but I strongly suggest checking out Reaper.
> 
> On Wed, Nov 28, 2018, 8:05 PM Eunsu Kim  <mailto:eunsu.bil...@gmail.com> wrote:
> Thank you for your response.
> 
> I will run repair from datacenter2 with your advice. Do I have to run repair 
> on every node in datacenter2?
> 
> There is no snapshot when checked with nodetool listsnaphosts.
> 
> Thank you.
> 
>> On 29 Nov 2018, at 4:31 AM, Elliott Sims > <mailto:elli...@backblaze.com>> wrote:
>> 
>> I think you answered your own question, sort of.
>> 
>> When you expand a cluster, it copies the appropriate rows to the new node(s) 
>> but doesn't automatically remove them from the old nodes.  When you ran 
>> cleanup on datacenter1, it cleared out those old extra copies.  I would 
>> suggest running a repair first for safety on datacenter2, then a "nodetool 
>> cleanup" on those hosts.  
>> 
>> Also run "nodetool snapshot" to make sure you don't have any old snapshots 
>> sitting around taking up space.
>> 
>> On Wed, Nov 28, 2018 at 5:29 AM Eunsu Kim > <mailto:eunsu.bil...@gmail.com>> wrote:
>> (I am sending the previous mail again because it seems that it has not been 
>> sent properly.)
>> 
>> HI experts,
>> 
>> I am running 2 datacenters each containing five nodes. (total 10 nodes, all 
>> 3.11.3)
>> 
>> My data is stored one at each data center. (REPLICATION = { 'class' : 
>> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1', 
>> 'datacenter2': '1’ })
>> 
>> Most of my data have a short TTL(14days). The gc_grace_seconds value for all 
>> tables is also 600sec.
>> 
>> I expect the two data centers to use the same size but datacenter2 is using 
>> more size. It seems that the datas of datacenter2 is rarely deleted. While 
>> the disk usage for datacenter1 remains constant, the disk usage for 
>> datacenter2 continues to grow.
>> 
>> ——
>> Datacenter: datacenter1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   Owns (effective)  Host ID  
>>  Rack
>> UN  10.61.58.228  925.48 GiB  256  21.5% 
>> 60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
>> UN  10.61.58.167  840 GiB256  20.0% 
>> a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
>> UN  10.61.75.86   1.13 TiB   256  19.3% 
>> 618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
>> UN  10.61.59.22   844.19 GiB  256  20.0% 
>> d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
>> UN  10.61.59.82   737.88 GiB  256  19.2% 
>> 054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
>> Datacenter: datacenter2
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   Owns (effective)  Host ID  
>>  Rack
>> UN  10.42.6.120   1.11 TiB   256  18.6% 
>> 69f15be0-e5a1-474e-87cf-b063e6854402  rack1
>> UN  10.42.5.207   1.17 TiB   256  20.0% 
>> f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
>> UN  10.42.6.471.01 TiB   256  20.1% 
>> 3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
>> UN  10.42.6.481007.67 GiB  256  20.4% 
>> 8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
>> UN  10.42.5.208   1.29 TiB   256  20.9% 
>> 4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
>> --
>> 
>> A few days ago, one node of datacenter1 broke down and replaced it, and I 
>> worked on rebuild, repair, and cleanup.
>> 
>> 
>> What else can I do?
>> 
>> Thank you in advance.
> 



Re: Data storage space unbalance issue

2018-11-28 Thread Eunsu Kim
Thank you for your response.

I will run repair from datacenter2 with your advice. Do I have to run repair on 
every node in datacenter2?

There is no snapshot when checked with nodetool listsnaphosts.

Thank you.

> On 29 Nov 2018, at 4:31 AM, Elliott Sims  wrote:
> 
> I think you answered your own question, sort of.
> 
> When you expand a cluster, it copies the appropriate rows to the new node(s) 
> but doesn't automatically remove them from the old nodes.  When you ran 
> cleanup on datacenter1, it cleared out those old extra copies.  I would 
> suggest running a repair first for safety on datacenter2, then a "nodetool 
> cleanup" on those hosts.  
> 
> Also run "nodetool snapshot" to make sure you don't have any old snapshots 
> sitting around taking up space.
> 
> On Wed, Nov 28, 2018 at 5:29 AM Eunsu Kim  <mailto:eunsu.bil...@gmail.com>> wrote:
> (I am sending the previous mail again because it seems that it has not been 
> sent properly.)
> 
> HI experts,
> 
> I am running 2 datacenters each containing five nodes. (total 10 nodes, all 
> 3.11.3)
> 
> My data is stored one at each data center. (REPLICATION = { 'class' : 
> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1', 
> 'datacenter2': '1’ })
> 
> Most of my data have a short TTL(14days). The gc_grace_seconds value for all 
> tables is also 600sec.
> 
> I expect the two data centers to use the same size but datacenter2 is using 
> more size. It seems that the datas of datacenter2 is rarely deleted. While 
> the disk usage for datacenter1 remains constant, the disk usage for 
> datacenter2 continues to grow.
> 
> ——
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID   
> Rack
> UN  10.61.58.228  925.48 GiB  256  21.5% 
> 60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
> UN  10.61.58.167  840 GiB256  20.0% 
> a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
> UN  10.61.75.86   1.13 TiB   256  19.3% 
> 618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
> UN  10.61.59.22   844.19 GiB  256  20.0% 
> d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
> UN  10.61.59.82   737.88 GiB  256  19.2% 
> 054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
> Datacenter: datacenter2
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID   
> Rack
> UN  10.42.6.120   1.11 TiB   256  18.6% 
> 69f15be0-e5a1-474e-87cf-b063e6854402  rack1
> UN  10.42.5.207   1.17 TiB   256  20.0% 
> f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
> UN  10.42.6.471.01 TiB   256  20.1% 
> 3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
> UN  10.42.6.481007.67 GiB  256  20.4% 
> 8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
> UN  10.42.5.208   1.29 TiB   256  20.9% 
> 4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
> --
> 
> A few days ago, one node of datacenter1 broke down and replaced it, and I 
> worked on rebuild, repair, and cleanup.
> 
> 
> What else can I do?
> 
> Thank you in advance.



Data storage space unbalance issue

2018-11-28 Thread Eunsu Kim
(I am sending the previous mail again because it seems that it has not been 
sent properly.)

HI experts,

I am running 2 datacenters each containing five nodes. (total 10 nodes, all 
3.11.3)

My data is stored one at each data center. (REPLICATION = { 'class' : 
'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1', 
'datacenter2': '1’ })

Most of my data have a short TTL(14days). The gc_grace_seconds value for all 
tables is also 600sec.

I expect the two data centers to use the same size but datacenter2 is using 
more size. It seems that the datas of datacenter2 is rarely deleted. While the 
disk usage for datacenter1 remains constant, the disk usage for datacenter2 
continues to grow.

——
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.61.58.228  925.48 GiB  256  21.5% 
60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
UN  10.61.58.167  840 GiB256  20.0% 
a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
UN  10.61.75.86   1.13 TiB   256  19.3% 
618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
UN  10.61.59.22   844.19 GiB  256  20.0% 
d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
UN  10.61.59.82   737.88 GiB  256  19.2% 
054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
Datacenter: datacenter2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.42.6.120   1.11 TiB   256  18.6% 
69f15be0-e5a1-474e-87cf-b063e6854402  rack1
UN  10.42.5.207   1.17 TiB   256  20.0% 
f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
UN  10.42.6.471.01 TiB   256  20.1% 
3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
UN  10.42.6.481007.67 GiB  256  20.4% 
8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
UN  10.42.5.208   1.29 TiB   256  20.9% 
4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
--

A few days ago, one node of datacenter1 broke down and replaced it, and I 
worked on rebuild, repair, and cleanup.


What else can I do?

Thank you in advance.

Re: Adding datacenter and data verification

2018-09-17 Thread Eunsu Kim
Yes, I altered the system_auth key space before adding the data center.

However, I suspect that the new data center did not get the system_auth data 
and therefore could not authenticate to the client. Because the new data center 
did not get the replica count by altering keyspace.

Do your clients have the 'withUsedHostsPerRemoteDc' option?


> On 18 Sep 2018, at 1:17 PM, Pradeep Chhetri  wrote:
> 
> Hello Eunsu,
> 
> I am also using PasswordAuthenticator in my cassandra cluster. I didn't come 
> across this issue while doing the exercise on preprod.
> 
> Are you sure that you changed the configuration of system_auth keyspace 
> before adding the new datacenter using this:
> 
> ALTER KEYSPACE system_auth WITH REPLICATION = {'class': 
> 'NetworkTopologyStrategy', 'datacenter1': '3'};
> 
> Regards,
> Pradeep
> 
> 
> 
> On Tue, Sep 18, 2018 at 7:23 AM, Eunsu Kim  <mailto:eunsu.bil...@gmail.com>> wrote:
> 
> In my case, there were authentication issues when adding data centers.
> 
> I was using a PasswordAuthenticator.
> 
> As soon as the datacenter was added, the following authentication error log 
> was recorded on the client log file.
> 
> com.datastax.driver.core.exceptions.AuthenticationException: Authentication 
> error on host /xxx.xxx.xxx.xx:9042: Provided username apm and/or password are 
> incorrect
> 
> I was using DCAwareRoundRobinPolicy, but I guess it's probably because of the 
> withUsedHostsPerRemoteDc option.
> 
> I took several steps and the error log disappeared. It is probably ’nodetool 
> rebuild' after altering the system_auth table.
> 
> However, the procedure was not clearly defined.
> 
> 
>> On 18 Sep 2018, at 2:40 AM, Pradeep Chhetri > <mailto:prad...@stashaway.com>> wrote:
>> 
>> Hello Alain,
>> 
>> Thank you very much for reviewing it. You answer on seed nodes cleared my 
>> doubts. I will update it as per your suggestion.
>> 
>> I have few followup questions on decommissioning of datacenter:
>> 
>> - Do i need to run nodetool repair -full on each of the nodes (old + new dc 
>> nodes) before starting the decommissioning process of old dc.
>> - We have around 15 apps using cassandra cluster. I want to make sure that 
>> all queries before starting the new datacenter are going with right 
>> consistency level i.e LOCAL_QUORUM instead of QUORUM. Is there a way i can 
>> log the consistency level of each query somehow in some log file.
>> 
>> Regards,
>> Pradeep
>> 
>> On Mon, Sep 17, 2018 at 9:26 PM, Alain RODRIGUEZ > <mailto:arodr...@gmail.com>> wrote:
>> Hello Pradeep,
>> 
>> It looks good to me and it's a cool runbook for you to follow and for others 
>> to reuse.
>> 
>> To make sure that cassandra nodes in one datacenter can see the nodes of the 
>> other datacenter, add the seed node of the new datacenter in any of the old 
>> datacenter’s nodes and restart that node.
>> 
>> Nodes seeing each other from the distinct rack is not related to seeds. It's 
>> indeed recommended to use seeds from all the datacenter (a couple or 3). I 
>> guess it's to increase availability on seeds node and/or maybe to make sure 
>> local seeds are available.
>> 
>> You can perfectly (and even have to) add your second datacenter nodes using 
>> seeds from the first data center. A bootstrapping node should never be in 
>> the list of seeds unless it's the first node of the cluster. Add nodes, then 
>> make them seeds.
>> 
>> 
>> Le lun. 17 sept. 2018 à 11:25, Pradeep Chhetri > <mailto:prad...@stashaway.com>> a écrit :
>> Hello everyone,
>> 
>> Can someone please help me in validating the steps i am following to migrate 
>> cassandra snitch.
>> 
>> Regards,
>> Pradeep
>> 
>> On Wed, Sep 12, 2018 at 1:38 PM, Pradeep Chhetri > <mailto:prad...@stashaway.com>> wrote:
>> Hello
>> 
>> I am running cassandra 3.11.3 5-node cluster on AWS with SimpleSnitch. I was 
>> testing the process to migrate to GPFS using AWS region as the datacenter 
>> name and AWS zone as the rack name in my preprod environment and was able to 
>> achieve it. 
>> 
>> But before decommissioning the older datacenter, I want to verify that the 
>> data in newer dc is in consistence with the one in older dc. Is there any 
>> easy way to do that. 
>> 
>> Do you suggest running a full repair before decommissioning the nodes of 
>> older datacenter ?
>> 
>> I am using the steps documented here: https://medium.com/p/465e9bf28d99 
>> <https://medium.com/p/465e9bf28d99> I will be very happy if someone can 
>> confirm me that i am doing the right steps.
>> 
>> Regards,
>> Pradeep
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 



Re: Adding datacenter and data verification

2018-09-17 Thread Eunsu Kim

In my case, there were authentication issues when adding data centers.

I was using a PasswordAuthenticator.

As soon as the datacenter was added, the following authentication error log was 
recorded on the client log file.

com.datastax.driver.core.exceptions.AuthenticationException: Authentication 
error on host /xxx.xxx.xxx.xx:9042: Provided username apm and/or password are 
incorrect

I was using DCAwareRoundRobinPolicy, but I guess it's probably because of the 
withUsedHostsPerRemoteDc option.

I took several steps and the error log disappeared. It is probably ’nodetool 
rebuild' after altering the system_auth table.

However, the procedure was not clearly defined.


> On 18 Sep 2018, at 2:40 AM, Pradeep Chhetri  wrote:
> 
> Hello Alain,
> 
> Thank you very much for reviewing it. You answer on seed nodes cleared my 
> doubts. I will update it as per your suggestion.
> 
> I have few followup questions on decommissioning of datacenter:
> 
> - Do i need to run nodetool repair -full on each of the nodes (old + new dc 
> nodes) before starting the decommissioning process of old dc.
> - We have around 15 apps using cassandra cluster. I want to make sure that 
> all queries before starting the new datacenter are going with right 
> consistency level i.e LOCAL_QUORUM instead of QUORUM. Is there a way i can 
> log the consistency level of each query somehow in some log file.
> 
> Regards,
> Pradeep
> 
> On Mon, Sep 17, 2018 at 9:26 PM, Alain RODRIGUEZ  > wrote:
> Hello Pradeep,
> 
> It looks good to me and it's a cool runbook for you to follow and for others 
> to reuse.
> 
> To make sure that cassandra nodes in one datacenter can see the nodes of the 
> other datacenter, add the seed node of the new datacenter in any of the old 
> datacenter’s nodes and restart that node.
> 
> Nodes seeing each other from the distinct rack is not related to seeds. It's 
> indeed recommended to use seeds from all the datacenter (a couple or 3). I 
> guess it's to increase availability on seeds node and/or maybe to make sure 
> local seeds are available.
> 
> You can perfectly (and even have to) add your second datacenter nodes using 
> seeds from the first data center. A bootstrapping node should never be in the 
> list of seeds unless it's the first node of the cluster. Add nodes, then make 
> them seeds.
> 
> 
> Le lun. 17 sept. 2018 à 11:25, Pradeep Chhetri  > a écrit :
> Hello everyone,
> 
> Can someone please help me in validating the steps i am following to migrate 
> cassandra snitch.
> 
> Regards,
> Pradeep
> 
> On Wed, Sep 12, 2018 at 1:38 PM, Pradeep Chhetri  > wrote:
> Hello
> 
> I am running cassandra 3.11.3 5-node cluster on AWS with SimpleSnitch. I was 
> testing the process to migrate to GPFS using AWS region as the datacenter 
> name and AWS zone as the rack name in my preprod environment and was able to 
> achieve it. 
> 
> But before decommissioning the older datacenter, I want to verify that the 
> data in newer dc is in consistence with the one in older dc. Is there any 
> easy way to do that. 
> 
> Do you suggest running a full repair before decommissioning the nodes of 
> older datacenter ?
> 
> I am using the steps documented here: https://medium.com/p/465e9bf28d99 
>  I will be very happy if someone can 
> confirm me that i am doing the right steps.
> 
> Regards,
> Pradeep
> 
> 
> 
> 
> 
> 



Re: Default Single DataCenter -> Multi DataCenter

2018-09-11 Thread Eunsu Kim
It’s self respond.

Step3 is wrong.

Even if it was a SimpleSnitch, changing the dc information will not start 
CassandraDaemon with the error log.

ERROR [main] 2018-09-11 18:36:30,272 CassandraDaemon.java:708 - Cannot start 
node if snitch's data center (pg1) differs from previous data center 
(datacenter1). Please fix the snitch configuration, decommission and 
rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.


> On 11 Sep 2018, at 2:25 PM, Eunsu Kim  wrote:
> 
> Hello
> 
> Thank you for your responses.
> 
> I’ll share my adding datacenter plan. If you see problems, please respond.
> 
> The sentence may be a little awkward because I am so poor at English that I 
> am being helped by a translator.
> 
> I've been most frequently referred to.(https://medium.com/p/465e9bf28d99 
> <https://medium.com/p/465e9bf28d99>) Thank you for your cleanliness. (Pradeep 
> Chhetri)
> 
> I will also upgrade the version as Alain Rodriguez's advice.
> 
> 
> 
> Step 1. Stop all existing clusters. (My service is paused.)
> 
> Step 2. Install Cassandra 3.11.3 and copy existing conf files.
> 
> Step 3. Modify cassandra-rackdc.properties for existing nodes. dc=mydc1 
> rack=myrack1
>  Q. I think this modification will not affect the existing data because 
> it was SimpleSnitch before, right?
> 
> Step 4. In the caseandra.yaml of existing nodes, endpoint_snitch is changed 
> to GossippingPropertyFileSnitch.
> 
> Step 5. Restart the Cassandra of the existing nodes. (My service is resumed.)
> 
> Step 6. Change the settings of all existing clients to DCAwareRobinPolicy and 
> refer to mydc1. Consistency level is LOCAL_ONE. And rolling restart.
>   Q. Isn't it a problem that at this point, DCAwareRobinPolicy and 
> RoundRobinPolicy coexist?
> 
> Step 7. Alter my keyspace and system keyspace(system_distributed, 
> system_traces) :  SimpleStrategy(RF=2) -> { 'class' : 
> 'NetworkTopologyStrategy', ‘mydc1’ : 2 }
> 
> Step 8. Install Cassandra in a new cluster, copying existing conf files, and 
> setting it to Cassandra-racdc.properties. dc=mydc2 rack=myrack2
> 
> Step 9. Adding a new seed node to the cassandra.yaml of the existing cluster 
> (mydc1) and restart.
>   Q1. Must I add the new seed nodes in five all existing nodes?
>   Q2. Don't I need to update the seed node settings of the new cluster 
> (mydc2)?
> 
> Step 10. Alter my keyspace and system keyspace(system_distributed, 
> system_traces) :  { 'class' : 'NetworkTopologyStrategy', ‘mydc1’ : 1, ‘mydc2’ 
> : 1 }
> 
> Step 11. Run 'nodetool rebuild — mydc1’ in turn, in the new node.
> 
> ———
> 
> 
> I'll run the procedure on the development envrionment and share it.
> 
> Thank you.
> 
> 
> 
> 
>> On 10 Sep 2018, at 10:26 PM, Pradeep Chhetri > <mailto:prad...@stashaway.com>> wrote:
>> 
>> Hello Eunsu, 
>> 
>> I am going through the same exercise at my job. I was making notes as i was 
>> testing the steps in my preproduction environment. Although I haven't tested 
>> end to end but hopefully this might help you: 
>> https://medium.com/p/465e9bf28d99 <https://medium.com/p/465e9bf28d99>
>> 
>> Regards,
>> Pradeep
>> 
>> On Mon, Sep 10, 2018 at 5:59 PM, Alain RODRIGUEZ > <mailto:arodr...@gmail.com>> wrote:
>> Adding a data center for the first time is a bit tricky when you haven't 
>> been considering it from the start.
>> 
>> I operate 5 nodes cluster (3.11.0) in a single data center with 
>> SimpleSnitch, SimpleStrategy and all client policy RoundRobin.
>> 
>> You will need:
>> 
>> - To change clients, make them 'DCAware'. This depends on the client, but 
>> you should be able to find this in your Cassandra driver (client side).
>> - To change clients, make them use 'LOCAL_' consistency 
>> ('LOCAL_ONE'/'LOCAL_QUORUM' being the most common).
>> - To change 'SimpleSnitch' for 'EC2Snitch' or 'GossipingPropertyFileSnitch' 
>> for example, depending on your context/preference
>> - To change 'SimpleStrategy' for 'NetworkTopologyStrategy' for all the 
>> keyspaces, with the desired RF. I take the chance to say that switching to 1 
>> replica only is often a mistake, you can indeed have data loss (which you 
>> accept) but also service going down, anytime you restart a node or that a 
>> node goes down. If you are ok with RF=1, RDBMS might be a better choice. 
>> It's an anti-pattern of some kind to run Cassandra with RF=1. Yet up to you, 
>> this is not our topic :). In the same kind of off-topic recommendations, I 
>> would not stick with C*3.11.0, but go to C*3.11.3 (if you do not perform 
>> slice

Re: Default Single DataCenter -> Multi DataCenter

2018-09-10 Thread Eunsu Kim
 confident or have doubts, you can share more about the context 
> and post your exact plan, as I did years ago in the mail previously linked. 
> People here should be able to confirm the process is ok before you move 
> forward, giving you an extra confidence.
> 
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com 
> <mailto:al...@thelastpickle.com>
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com <http://www.thelastpickle.com/>
> 
> Le lun. 10 sept. 2018 à 11:05, Eunsu Kim  <mailto:eunsu.bil...@gmail.com>> a écrit :
> Hello everyone
> 
> I operate 5 nodes cluster (3.11.0) in a single data center with SimpleSnitch, 
> SimpleStrategy and all client policy RoundRobin.
> 
> At this point, I am going to create clusters of the same size in different 
> data centers.
> 
> I think these two documents are appropriate, but there is confusion because 
> they are referenced to each other.
> 
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>  
> <https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html
>  
> <https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html>
> 
> Anyone who can clearly guide the order? Currently RF is 2 and I want to have 
> only one replica in the NetworkTopologyStrategy.
> A little data loss is okay.
> 
> Thank you in advanced..
> 
> 
> 
> 
> 
> 
> 



Default Single DataCenter -> Multi DataCenter

2018-09-10 Thread Eunsu Kim
Hello everyone

I operate 5 nodes cluster (3.11.0) in a single data center with SimpleSnitch, 
SimpleStrategy and all client policy RoundRobin.

At this point, I am going to create clusters of the same size in different data 
centers.

I think these two documents are appropriate, but there is confusion because 
they are referenced to each other.

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
 

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html
 


Anyone who can clearly guide the order? Currently RF is 2 and I want to have 
only one replica in the NetworkTopologyStrategy.
A little data loss is okay.

Thank you in advanced..








about cassandra..

2018-08-08 Thread Eunsu Kim
Hi all.

I’m worried about the amount of disk I use, so I’m more curious about 
compression. We are currently using 3.11.0 and use default LZ4 Compressor 
('chunk_length_in_kb': 64).
Is there a setting that can make more powerful compression?
Because most of them are time series data with TTL, we use 
TimeWindowCompositionStrategy.

Thank you in advance.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: cassandra cluser sizing

2018-07-17 Thread Eunsu Kim
Can I ask you an additional question here?

How much free space should I have if most tables use 
TimeWindowCompactionStrategy?


> On 13 Jul 2018, at 10:09 PM, Vitaliy Semochkin  wrote:
> 
> Jeff, thank you very much for reply.
> Will try to use 4TB per instance.
> 
> If I understand it correctly level compaction can lead to 50%
> https://docs.datastax.com/en/dse-planning/doc/planning/planningHardware.html
> 
> Regarding the question of running multiple instances per server, am I
> correct that in case of 3.11 instances and having several disks
> dedicated for each instance, running multiple instances per server is
> ok?
> 
> 
> On Thu, Jul 12, 2018 at 5:47 PM Jeff Jirsa  wrote:
>> 
>> You can certainly go higher than a terabyte - 4 or so is common, Ive heard 
>> of people doing up to 12 tb with the awareness that time to replace scales 
>> with size on disk, so a very large host will take longer to rebuild than a 
>> small host
>> 
>> The 50% free guidance only applies to size tiered compaction, and given your 
>> throughput you may prefer leveled compaction anyway. With leveled you should 
>> target 30% free for compaction and repair
>> 
>> You don’t need more than one Cassandra instance per host for 4tb but you may 
>> want to consider it for more than that - multiple instances are especially 
>> useful if you have multiple (lots of) disks and are running Cassandra before 
>> CASSANDRA-6696 (which made jbod safer).
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Jul 12, 2018, at 7:37 AM, Vitaliy Semochkin  wrote:
>>> 
>>> Hi,
>>> 
>>> Which amount of data Cassandra 3 server in a cluster can serve at max?
>>> The documentation says it is only 1TB.
>>> If the load is not high (only about 100 requests per second with 1kb
>>> of data each) is it safe to go above 1TB size (let's say 5TB per
>>> server)?
>>> What will be safe maximum disk size a server in such cluster can serve?
>>> 
>>> Documentation also says that  compaction  requires to have %50 of disk
>>> occupied space. In case I don't have update operations (only insert)
>>> do I need that much extra space for compaction?
>>> 
>>> In articles (outside Datastax docs) I read that it is a common
>>> practice to launch more than one Cassandra server on one physical
>>> server in order to be able use more than 1TB of hard driver per
>>> server, is it recommended?
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 



Re: What will happen after adding another data disk

2018-06-12 Thread Eunsu Kim
In my experience, adding a new disk and restarting the Cassandra process slowly 
distributes the disk usage evenly, so that existing disks have less disk usage

> On 12 Jun 2018, at 11:09 AM, wxn...@zjqunshuo.com wrote:
> 
> Hi,
> I know Cassandra can make use of multiple disks. My data disk is almost full 
> and I want to add another 2TB disk. I don't know what will happen after the 
> addition.
> 1. C* will write to both disks util the old disk is full?
> 2. And what will happen after the old one is full? Will C* stop writing to 
> the old one and only writing to the new one with free space?
> 
> Thanks!



Re: GUI clients for Cassandra

2018-04-22 Thread Eunsu Kim
I am now writing dbeaver EE, but I’m waiting for TeamSQL (https://teamsql.io) 
to support cassandra.

> On 23 Apr 2018, at 7:56 AM, Tim Moore  wrote:
> 
> I use the command-line too, but have heard some recommendations for DBeaver 
> EE as a cross-database GUI with support for Cassandra: https://dbeaver.com/ 
> 
> 
> On Sun, Apr 22, 2018 at 3:58 PM, Hannu Kröger  > wrote:
> Hello everyone!
> 
> I have been asked many times that what is a good GUI client for Cassandra. 
> DevCenter is not available anymore and DataStax has a DevStudio but that’s 
> for DSE only.
> 
> Are there some 3rd party GUI tools that you are using a lot? I always use the 
> command line client myself. I have tried to look for some Cassandra related 
> tools but I haven’t found any good one yet.
> 
> Cheers,
> Hannu
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 
> 
> 
> -- 
> Tim Moore
> Lagom Tech Lead, Lightbend, Inc.
> tim.mo...@lightbend.com 
> +61 420 981 589
> Skype: timothy.m.moore
> 
>  


Can I sort it as a result of group by?

2018-04-09 Thread Eunsu Kim
Hello, everyone.

I am using 3.11.0 and I have the following table.

CREATE TABLE summary_5m (
service_key text,
hash_key int,
instance_hash int,
collected_time timestamp,
count int,
PRIMARY KEY ((service_key), hash_key, instance_hash, collected_time)
)


And I can sum count grouping by primary key.

select service_key, hash_key, instance_hash, sum(count) as count_summ 
from apm.ip_summary_5m 
where service_key='ABCED'
group by service_key, hash_key, instance_hash;


But what I want is to get only the top 100 with a high value added.

Like following query is attached … (syntax error, of course)

order by count_sum limit 100;

Anybody have ever solved this problem?

Thank you in advance.




Re: Self read throughput increased rapidly

2018-03-11 Thread Eunsu Kim
We should find out more about the ongoing user.

Thank you for your reponse

> On 12 Mar 2018, at 11:43 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> Well it’s hard to say unless you’re more precise with what JMX emitted metric 
> you’re graphing, but yes, there are read metrics that will increase if people 
> read the data
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Mar 11, 2018, at 7:38 PM, Eunsu Kim <eunsu.bil...@gmail.com 
> <mailto:eunsu.bil...@gmail.com>> wrote:
> 
>> No I didn’t
>> 
>> Do you mean that this read throughput value comes out because somebody 
>> actually reads the data?
>> 
>> 
>>> On 12 Mar 2018, at 11:32 AM, Jeff Jirsa <jji...@gmail.com 
>>> <mailto:jji...@gmail.com>> wrote:
>>> 
>>> I presume you’re asking because you don’t think you’re doing reads. Did you 
>>> start doing counter writes?
>>> 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>> On Mar 11, 2018, at 7:30 PM, Eunsu Kim <eunsu.bil...@gmail.com 
>>> <mailto:eunsu.bil...@gmail.com>> wrote:
>>> 
>>>> We monitored the write/read throughput through the Cassandra cluster via 
>>>> JMX. There was little read throughput before March 6. From March 6, the 
>>>> write throughput increased, but the read throughput suddenly increased 
>>>> sharply. Do anyone know why this happens?
>>>> 
>>>> 
>> 



Re: Self read throughput increased rapidly

2018-03-11 Thread Eunsu Kim
No I didn’t

Do you mean that this read throughput value comes out because somebody actually 
reads the data?


> On 12 Mar 2018, at 11:32 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> I presume you’re asking because you don’t think you’re doing reads. Did you 
> start doing counter writes?
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Mar 11, 2018, at 7:30 PM, Eunsu Kim <eunsu.bil...@gmail.com 
> <mailto:eunsu.bil...@gmail.com>> wrote:
> 
>> We monitored the write/read throughput through the Cassandra cluster via 
>> JMX. There was little read throughput before March 6. From March 6, the 
>> write throughput increased, but the read throughput suddenly increased 
>> sharply. Do anyone know why this happens?
>> 
>> 



Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
Thanks for the answer. I never forget to flush, drain before shutting down 
Cassandra.
It is a system that deals with lighter and faster data than accuracy. So rf = 2 
and cl = one.
Thank you again.


> On 9 Mar 2018, at 3:12 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> There is no shuffling as the servers go up and down. Cassandra doesn’t do 
> that. 
> 
> However, rf=2 is atypical and sometime problematic.
> 
> If you read or write with quorum / two / all, you’ll get unavailables during 
> the restart
> 
> If you read or write with cl one, you’ll potentially not see data previously 
> written (with or without the restart).
> 
> This is all just normal eventual consistency stuff, but be sure you 
> understand it - rf3 may be a better choice
> 
> On restart, be sure you shut down cleanly - nodetool flush and then 
> immediately nodetool drain.  Beyond that I’d expect you to be fine.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Mar 8, 2018, at 9:52 PM, Eunsu Kim <eunsu.bil...@gmail.com> wrote:
>> 
>> There are currently 5 writes per second. I was worried that the server 
>> downtime would be quite long during disk mount operations.
>> If the data shuffling that occurs when the server goes down or up is working 
>> as expected, I seem to be an unnecessary concern.
>> 
>> 
>>> On 9 Mar 2018, at 2:19 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>> 
>>> I see no reason to believe you’d lose data doing this - why do you suspect 
>>> you may? 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Mar 8, 2018, at 8:36 PM, Eunsu Kim <eunsu.bil...@gmail.com> wrote:
>>>> 
>>>> The auto_snapshot setting is disabled. And the directory architecture on 
>>>> the five nodes will match exactly.
>>>> 
>>>> (Cassandra/Server shutdown -> Mount disk -> Add directory to 
>>>> data_file_directories -> Start Cassandra) * 5 rolling
>>>> 
>>>> Is it possible to add disks without losing data by doing the above 
>>>> procedure?
>>>> 
>>>> 
>>>> 
>>>>> On 7 Mar 2018, at 7:59 PM, Rahul Singh <rahul.xavier.si...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> Are you putting both the commitlogs and the Sstables on the adds? 
>>>>> Consider moving your snapshots often if that’s also taking up space. 
>>>>> Maybe able to save some space before you add drives.
>>>>> 
>>>>> You should be able to add these new drives and mount them without an 
>>>>> issue. Try to avoid different number of data dirs across nodes. It makes 
>>>>> automation of operational processes a little harder.
>>>>> 
>>>>> As an aside, Depending on your usecase you may not want to have a data 
>>>>> density over 1.5 TB per node.
>>>>> 
>>>>> --
>>>>> Rahul Singh
>>>>> rahul.si...@anant.us
>>>>> 
>>>>> Anant Corporation
>>>>> 
>>>>>> On Mar 7, 2018, 1:26 AM -0500, Eunsu Kim <eunsu.bil...@gmail.com>, wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> I use 5 nodes to create a cluster of Cassandra. (SSD 1TB)
>>>>>> 
>>>>>> I'm trying to mount an additional disk(SSD 1TB) on each node because 
>>>>>> each disk usage growth rate is higher than I expected. Then I will add 
>>>>>> the the directory to data_file_directories in cassanra.yaml
>>>>>> 
>>>>>> Can I get advice from who have experienced this situation?
>>>>>> If we go through the above steps one by one, will we be able to complete 
>>>>>> the upgrade without losing data?
>>>>>> The replication strategy is SimpleStrategy, RF 2.
>>>>>> 
>>>>>> Thank you in advance
>>>>>> -
>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>> 
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
There are currently 5 writes per second. I was worried that the server 
downtime would be quite long during disk mount operations.
If the data shuffling that occurs when the server goes down or up is working as 
expected, I seem to be an unnecessary concern.


> On 9 Mar 2018, at 2:19 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> I see no reason to believe you’d lose data doing this - why do you suspect 
> you may? 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Mar 8, 2018, at 8:36 PM, Eunsu Kim <eunsu.bil...@gmail.com> wrote:
>> 
>> The auto_snapshot setting is disabled. And the directory architecture on the 
>> five nodes will match exactly.
>> 
>> (Cassandra/Server shutdown -> Mount disk -> Add directory to 
>> data_file_directories -> Start Cassandra) * 5 rolling
>> 
>> Is it possible to add disks without losing data by doing the above procedure?
>> 
>> 
>> 
>>> On 7 Mar 2018, at 7:59 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
>>> 
>>> Are you putting both the commitlogs and the Sstables on the adds? Consider 
>>> moving your snapshots often if that’s also taking up space. Maybe able to 
>>> save some space before you add drives.
>>> 
>>> You should be able to add these new drives and mount them without an issue. 
>>> Try to avoid different number of data dirs across nodes. It makes 
>>> automation of operational processes a little harder.
>>> 
>>> As an aside, Depending on your usecase you may not want to have a data 
>>> density over 1.5 TB per node.
>>> 
>>> --
>>> Rahul Singh
>>> rahul.si...@anant.us
>>> 
>>> Anant Corporation
>>> 
>>>> On Mar 7, 2018, 1:26 AM -0500, Eunsu Kim <eunsu.bil...@gmail.com>, wrote:
>>>> Hello,
>>>> 
>>>> I use 5 nodes to create a cluster of Cassandra. (SSD 1TB)
>>>> 
>>>> I'm trying to mount an additional disk(SSD 1TB) on each node because each 
>>>> disk usage growth rate is higher than I expected. Then I will add the the 
>>>> directory to data_file_directories in cassanra.yaml
>>>> 
>>>> Can I get advice from who have experienced this situation?
>>>> If we go through the above steps one by one, will we be able to complete 
>>>> the upgrade without losing data?
>>>> The replication strategy is SimpleStrategy, RF 2.
>>>> 
>>>> Thank you in advance
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Adding disk to operating C*

2018-03-08 Thread Eunsu Kim
The auto_snapshot setting is disabled. And the directory architecture on the 
five nodes will match exactly.

 (Cassandra/Server shutdown -> Mount disk -> Add directory to 
data_file_directories -> Start Cassandra) * 5 rolling

Is it possible to add disks without losing data by doing the above procedure?



> On 7 Mar 2018, at 7:59 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
> 
> Are you putting both the commitlogs and the Sstables on the adds? Consider 
> moving your snapshots often if that’s also taking up space. Maybe able to 
> save some space before you add drives.
> 
> You should be able to add these new drives and mount them without an issue. 
> Try to avoid different number of data dirs across nodes. It makes automation 
> of operational processes a little harder.
> 
> As an aside, Depending on your usecase you may not want to have a data 
> density over 1.5 TB per node.
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Mar 7, 2018, 1:26 AM -0500, Eunsu Kim <eunsu.bil...@gmail.com>, wrote:
>> Hello,
>> 
>> I use 5 nodes to create a cluster of Cassandra. (SSD 1TB)
>> 
>> I'm trying to mount an additional disk(SSD 1TB) on each node because each 
>> disk usage growth rate is higher than I expected. Then I will add the the 
>> directory to data_file_directories in cassanra.yaml
>> 
>> Can I get advice from who have experienced this situation?
>> If we go through the above steps one by one, will we be able to complete the 
>> upgrade without losing data?
>> The replication strategy is SimpleStrategy, RF 2.
>> 
>> Thank you in advance
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Adding disk to operating C*

2018-03-06 Thread Eunsu Kim
Hello,

I use 5 nodes to create a cluster of Cassandra. (SSD 1TB)

I'm trying to mount an additional disk(SSD 1TB) on each node because each disk 
usage growth rate is higher than I expected. Then I will add the the directory 
to data_file_directories in cassanra.yaml

Can I get advice from who have experienced this situation?
If we go through the above steps one by one, will we be able to complete the 
upgrade without losing data?
The replication strategy is SimpleStrategy, RF 2.

Thank you in advance
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



if the heap size exceeds 32GB..

2018-02-12 Thread Eunsu Kim
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#compressed_oops
 


According to the article above, if the heap size of the JVM is about 32GB, it 
is a waste of memory because it can not use the compress object pointer. (Of 
course talking about ES)

But if this is a general theory about the JVM, does that apply to Cassandra as 
well?

I am using a 64 GB physical memory server and I am concerned about heap size 
allocation.

Thank you.

Re: Even after the drop table, the data actually was not erased.

2018-01-14 Thread Eunsu Kim
Thank you for your response.  As you said, the auto_bootstrap setting was 
turned on. 
The actual data was deleted with the 'nodetool clearsnapshot' command.
This command seems to apply only to one node. Can it be applied cluster-wide? 
Or should I run this command on each node?



> On 12 Jan 2018, at 8:10 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> 
> Hello,
> 
> However, the actual size of the data directory did not decrease at all. Disk 
> Load monitored by JMX has been decreased.
> 
> This sounds like 'auto_snapshot' is enabled. This option will trigger a 
> snapshot before any table drop / truncate to prevent user mistakes mostly. 
> Then the data is removed but as it is still referenced by the snapshot (hard 
> link), space cannot be freed.
> 
> Running 'nodetool clearsnapshot' should help reducing the dataset size in 
> this situation.
> 
> 
> The client fails to establish a connection and I see the following exceptions 
> in the Cassandra logs. 
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId…
> 
> This does not look like a failed connection to me but rather a try to query 
> some inexistent data. If that's the data you just deleted (keyspace / table), 
> this is expected. If not there is an other issue, I hope not related to the 
> delete in this case...
> 
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com 
> <mailto:al...@thelastpickle.com>
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com <http://www.thelastpickle.com/>
> 
> 
> 
> 2018-01-12 7:14 GMT+00:00 Eunsu Kim <eunsu.bil...@gmail.com 
> <mailto:eunsu.bil...@gmail.com>>:
> hi everyone
> 
> On the development server, I dropped all the tables and even keyspace dropped 
> to change the table schema.
> Then I created the keyspace and the table.
> 
> However, the actual size of the data directory did not decrease at all. Disk 
> Load monitored by JMX has been decreased.
> 
> 
> 
> 
> After that, Cassandra does not work normally.
> 
> The client fails to establish a connection and I see the following exceptions 
> in the Cassandra logs.
> 
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
> cfId…….org.apache.cassandra.io.FSReadError: java.io.IOException: Digest 
> mismatch exception……
> 
> 
> After the data is forcibly deleted, Cassandra is restarted in a clean state 
> and works well.
> 
> Can anyone guess why this is happening?
> 
> Thank you in advance.
> 



Even after the drop table, the data actually was not erased.

2018-01-11 Thread Eunsu Kim
hi everyone

On the development server, I dropped all the tables and even keyspace dropped 
to change the table schema.
Then I created the keyspace and the table.

However, the actual size of the data directory did not decrease at all. Disk 
Load monitored by JMX has been decreased.




After that, Cassandra does not work normally.

The client fails to establish a connection and I see the following exceptions 
in the Cassandra logs.

org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for 
cfId…….org.apache.cassandra.io.FSReadError: java.io.IOException: Digest 
mismatch exception……


After the data is forcibly deleted, Cassandra is restarted in a clean state and 
works well.

Can anyone guess why this is happening?

Thank you in advance.

Re: default_time_to_live setting in time series data

2018-01-11 Thread Eunsu Kim
Thanks for the quick response. TWCS is used.



> On 12 Jan 2018, at 11:38 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> Probably not in any measurable way. 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 11, 2018, at 6:16 PM, Eunsu Kim <eunsu.bil...@gmail.com> wrote:
>> 
>> Hi everyone
>> 
>> We are collecting monitoring data in excess of 100K TPS in Cassandra. 
>> 
>> All data is time series data and must have a TTL. 
>> 
>> Currently we have set default_time_to_live on the table.
>> 
>> Does this have a negative impact on Cassandra throughput performance?
>> 
>> Thank you in advance.
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



default_time_to_live setting in time series data

2018-01-11 Thread Eunsu Kim
Hi everyone

We are collecting monitoring data in excess of 100K TPS in Cassandra. 

All data is time series data and must have a TTL. 

Currently we have set default_time_to_live on the table.

Does this have a negative impact on Cassandra throughput performance?

Thank you in advance.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to get page id without transmitting data to client

2018-01-01 Thread Eunsu Kim
Thank you for your response. happy new year


> On 30 Dec 2017, at 5:33 AM, Andy Tolbert <andrew.tolb...@datastax.com> wrote:
> 
> Hi Eunsu,
> 
> Unfortunately there is not really a way to do this that I'm aware of.  The 
> page id contains data indicating where to start reading the next set of rows 
> (such as partition and clustering information), and in order to get to that 
> position you have to actually read the data.
> 
> The driver does have an API for manually specifying the page id to use and 
> we've documented some strategies 
> <https://docs.datastax.com/en/developer/java-driver/3.3/manual/paging/#saving-and-reusing-the-paging-state>
>  for storing and reusing the page id later, but not sure if that helps for 
> your particular use case.
> 
> Thanks,
> Andy
> 
> On Thu, Dec 28, 2017 at 9:11 PM, Eunsu Kim <eunsu.bil...@gmail.com 
> <mailto:eunsu.bil...@gmail.com>> wrote:
> Hello everybody,
> 
> I am using the datastax Java driver (3.3.0).
> 
> When query large amounts of data, we set the fetch size (1) and transmit 
> the data to the browser on a page-by-page basis.
> 
> I am wondering if I can get the page id without receiving the real rows from 
> the cassandra to my server.
> 
> I only need 100 in front of 100,000. But I want the next page to be 11th.
> 
> If you have a good idea, please share it.
> 
> Thank you.
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
> 
> 



How to get page id without transmitting data to client

2017-12-28 Thread Eunsu Kim
Hello everybody,

I am using the datastax Java driver (3.3.0).

When query large amounts of data, we set the fetch size (1) and transmit 
the data to the browser on a page-by-page basis.

I am wondering if I can get the page id without receiving the real rows from 
the cassandra to my server.

I only need 100 in front of 100,000. But I want the next page to be 11th.

If you have a good idea, please share it.

Thank you.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



about write performance

2017-12-07 Thread Eunsu Kim
There is a table with a timestamp as a cluster key and sorted by ASC for the 
column.

Is it better to insert by the time order when inserting data into this table 
for insertion performance? Or does it matter?

Thank you.