Cluster Management

2013-08-25 Thread Anthony Grasso
Hi Cassandra Users, Before I go ahead and create my own solution... are there any tools that exist to help with the management of a Cassandra cluster? For example, if I want to make some changes to the configuration file that resides on each node, is there a tool that will propagate the change

Re: Cluster Management

2013-08-29 Thread Anthony Grasso
Hi Particia, Thank you for the feedback. It has been helpful. On Tue, Aug 27, 2013 at 12:02 AM, Patricia Gorla gorla.patri...@gmail.comwrote: Anthony, We use a number of tools to manage our Cassandra cluster. * Datastax OpsCenter [0] for at a glance information, and trending statistics.

Re: Cluster Management

2013-08-29 Thread Anthony Grasso
Thanks Nate! We will look into this one to see if we can use it. Regards, Anthony On Tue, Aug 27, 2013 at 12:22 AM, Nate McCall n...@thelastpickle.comwrote: For example, if I want to make some changes to the configuration file that resides on each node, is there a tool that will propagate

Re: Recommended amount of free disk space for compaction

2013-11-29 Thread Anthony Grasso
Hi Robert, We found having about 50% free disk space is a good rule of thumb. Cassandra will typically use less than that when running compactions, however it is good to have free space available just in case it compacts some of the larger SSTables in the keyspace. More information can be found

Re: Data loss when swapping out cluster

2013-11-29 Thread Anthony Grasso
Hi Robert, In this case would it be possible to do the following to replace a seed node? nodetool disablethrift nodetool disablegossip nodetool drain stop Cassandra deep copy /var/lib/cassandra/* on old seed node to new seed node start Cassandra on new seed node Regards, Anthony On Wed,

Re: Gotchas when creating a lot of tombstones

2014-01-10 Thread Anthony Grasso
Hi Robert, It sounds like you have done a fair bit investigating and testing already. Have you considered using a time based data model to avoid doing deletions in the database? Regards, Anthony On Thu, Jan 9, 2014 at 1:26 PM, sankalp kohli kohlisank...@gmail.comwrote: With Level compaction,

Re: How can I scale my read rate?

2017-03-26 Thread Anthony Grasso
Keep in mind there are side effects to increasing to RF = 4 - Storage requirements for each node will increase. Depending on the number of nodes in the cluster and the size of the data this could be significant. - Whilst the number of available coordinators increases, the number of

Re: cassandra 3.10

2017-05-11 Thread Anthony Grasso
Hi Dhruva, There are definitely some performance improvements to Storage Engine in Cassandra 3.10 which make it worth the upgrade. Note that Cassandra 3.11 has further bug fixes and it may be worth considering a migration to that version. Regarding the issue of building a Cassandra 3.10 RPM, it

Re: Cassandra Snapshots and directories

2017-05-14 Thread Anthony Grasso
Hi Daniel, Yes, you are right it does require some additional work to rsync just the snapshots. What about doing something like this to make rsync syntax for the backup easier? # in the Cassandra data directory, iterate through the keyspaces for ks in $(find . -type d -iname backup) do #

Re: Reg:- Data Modelling Concepts

2017-05-17 Thread Anthony Grasso
Hi Nandan, If there is a requirement to answer a query "What are the changes to a book made by a particular user?", then yes the schema you have proposed can work. To obtain the list of updates for a book by a user from the *book_title_by_user* table will require the partition key (*book_title*),

Re: Reg:- Data Modelling based on Update History details

2017-05-15 Thread Anthony Grasso
Hi Nandan, Interesting project! One thing that helps define the schema is knowing what queries will be made to the database up front. It sounds like you have an idea already of what those queries will be. I want to confirm that these are the queries that the database needs to answer. - *What

Re: How do you do automatic restacking of AWS instance for cassandra?

2017-05-28 Thread Anthony Grasso
Hi Surbhi, Please see my comment inline below. On 28 May 2017 at 12:11, Jeff Jirsa wrote: > > > On 2017-05-27 18:04 (-0700), Surbhi Gupta > wrote: > > Thanks a lot for all of your reply. > > Our requirement is : > > Our company releases AMI almost

Re: Restarting nodes and reported load

2017-05-31 Thread Anthony Grasso
Hi Daniel, When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine? How are you reclaiming disk space at the moment? Does disk space free up after the restart? Regarding storage on nodes, keep in mind the more data stored

Re: Very slow cluster

2017-04-30 Thread Anthony Grasso
Hi Eduardo, Please see my comment inline below regarding your third question. Regards, Anthony On 28 April 2017 at 21:26, Eduardo Alonso wrote: > Hi to all: > > I am having some problems with two client's cassandra:3.0.8 clusters i > want to share with you. These

Re: Smart Table creation for 2D range query

2017-05-08 Thread Anthony Grasso
Hi Lydia, Yes. This will define the *x*, *y* columns as the components of the partition key. Note that by doing this both *x* and *y* values will be required to at a minimum to perform a valid query. Alternatively, the *x* and *y* values could be combined in into a single text field as Jon has

Re: Rebalance a cassandra cluster

2017-09-15 Thread Anthony Grasso
As Kurt mentioned, you definitely need to pick a partition key that ensure data is uniformly distributed. If you want to want to redistribute the data in cluster and move tokens around, you could decommission the node with the tokens you want to redistribute and then bootstrap a new node into the

Re: Restore cassandra snapshots

2017-10-17 Thread Anthony Grasso
Hi Pradeep, If you are going to copy N snapshots to N nodes you will need to make sure you have the System keyspace as part of that snapshot. The System keyspace that is local to each node, contains the token allocations for that particular node. This allows the node to work out what data it is

Re: Node Failure Scenario

2017-11-13 Thread Anthony Grasso
Hi Anshu, To add to Erick's comment, remember to remove the *replace_address* method from the *cassandra-env.sh* file once the node has rejoined successfully. The node will fail the next restart otherwise. Alternatively, use the *replace_address_first_boot* method which works exactly the same

Re: Slender Cassandra Cluster Project

2018-01-21 Thread Anthony Grasso
Hi Kenneth, Fantastic idea! One thing that came to mind from my reading of the proposed setup was rack awareness of each node. Given that the proposed setup contains three DCs, I assume that each node will be made rack aware? If not, consider defining three racks for each DC and placing two

Re: [EXTERNAL] Cassandra cluster add new node slowly

2018-01-03 Thread Anthony Grasso
The speed at which compactions operate is also physically restricted by the speed of the disk. If the disks used on the new node are HDDs, then increasing the compaction throughput will be of little help. However, if the disks on the new node are SSDs then increasing the compaction throughput to

Re: Bind keyspace to specific data directory

2018-07-16 Thread Anthony Grasso
Hi Abdul, There is no mechanism offered in Cassandra to bind a keyspace (when created) to specific filesystem or directory. If multiple filesystems or directories are specified in the data_file_directories property in the *cassandra.yaml* then Cassandra will attempt to evenly distribute data from

Re: Check Cluster Health

2018-07-04 Thread Anthony Grasso
Hi, Yes, you can use nodetool status to inspect the health/status of the cluster. Using *nodetool status * will show the cluster health/status as well as the amount of data that each node has for the specified **. Using *nodetool status* without the argument will only show the cluster

Re: command to view yaml file setting in use on console

2018-03-12 Thread Anthony Grasso
Hi Kenneth, In addition to CASSANDRA-7622, it may help to inspect the Cassandra *system.log* and look for the following entry: INFO [main] ... - Node configuration:[...] The content of "Node configuration" will have the settings the node is using. Regards, Anthony On Tue, 13 Mar 2018 at

Re: Cassandra vs MySQL

2018-03-14 Thread Anthony Grasso
Hi Oliver, I was in a similar situation to you and Matija a few years back as well and can vouch for what Matija has said. Some data sets are more suitable for Cassandra than others; so the answer to your question depends on the type of data and how it is modelled in Cassandra. The data model

Re: replace dead node vs remove node

2018-03-22 Thread Anthony Grasso
Hi Peng, Depending on the hardware failure you can do one of two things: 1. If the disks are intact and uncorrupted you could just use the disks with the current data on them in the new node. Even if the IP address changes for the new node that is fine. In that case all you need to do is run

Re: replace dead node vs remove node

2018-03-22 Thread Anthony Grasso
nt_window_in_ms,we must run > repair to make the replaced node consistent again, since it missed ongoing > writes during bootstrapping.but for a great cluster,repair is a painful > process. > > Thanks, > Peng Xiao > > > > ------ 原始邮件 -- > *发件人

Re: Assassinate fails

2019-04-03 Thread Anthony Grasso
Hi Alex, We wrote a blog post on this topic late last year: http://thelastpickle.com/blog/2018/09/18/assassinate.html. In short, you will need to run the assassinate command on each node simultaneously a number of times in quick succession. This will generate a number of messages requesting all

Re: All time blocked in nodetool tpstats

2019-04-10 Thread Anthony Grasso
Hi Abdul, Usually we get no noticeable improvement at tuning concurrent_reads and concurrent_writes above 128. I generally try to keep current_reads to no higher than 64 and concurrent_writes to no higher than 128. In creasing the values beyond that you might start running into issues where the

Re: Topology settings before/after decommission node

2019-04-10 Thread Anthony Grasso
Hi Robert, Your action plan looks good. You can think of the *cassandra-topology.properties* file as a map for the cluster. The map between the nodes must be consistent because each node uses it to determine where it is meant to be located logically. It is good hygiene to maintain the

Re: Cassandra 2.1.18 - NPE during startup

2019-04-12 Thread Anthony Grasso
Hi Thomas, The process you suggested to get around the issue should work with the system.keyspaces table. Make sure to backup the original *system.keyspaces* table files on the node that fails to start. Then, copy only the *system.keyspaces *table files from a working node into the

Re: datacorruption with cassandra 2.1.11

2019-05-16 Thread Anthony Grasso
Did you roll back to OpenJDK 1.7u181 or did you upgrade to a more recent version? On Thu, 16 May 2019 at 13:43, keshava wrote: > The java version that we were using and which turns out to be causing this > issue was OpenJdk 1.7 u191 > > On 16-May-2019 06:02, "sankalp kohli" wrote: > >> which

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-05-05 Thread Anthony Grasso
; > "The best way to predict the future is to invent it" Alan Kay > > > On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso > wrote: > >> Hi Jean, >> >> It sounds like there are no nodes in one of the racks for the eu-west-3 >> datacenter. What does th

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso
Good idea Jeff. I can add that in if you like? Do we have a ticket for it or should I just raise one? On Mon, 6 May 2019 at 03:49, Jeff Jirsa wrote: > Picking an ideal allocation for N seed nodes and M vnodes per seed is > probably something we should add as a little python script or similar in

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso
Hi If you are planning on setting up a new cluster with allocate_tokens_for_keyspace, then yes, you will need one seed node per rack. As Jon mentioned in a previous email, you must manually specify the token range for *each* seed node. This can be done using the initial_token setting. The

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-28 Thread Anthony Grasso
Hi Jean, It sounds like there are no nodes in one of the racks for the eu-west-3 datacenter. What does the output of nodetool status look like currently? Note, you will need to start a node in each rack before creating the keyspace. I wrote a blog post with the procedure to set up a new cluster

Re: Uneven token distribution with allocate_tokens_for_keyspace

2019-12-04 Thread Anthony Grasso
I thought it was needed only for new > "clusters", not also for new "DCs"; but RF is per DC so it makes sense. > > You TLP guys are doing a great job for Cassandra community. > > Thank you, > Enrico > > > On Fri, 29 Nov 2019 at 05:09, Anthony Grasso > wro

Re: Uneven token distribution with allocate_tokens_for_keyspace

2019-11-28 Thread Anthony Grasso
Hi Enrico, This is a classic chicken and egg problem with the allocate_tokens_for_keyspace setting. The allocate_tokens_for_keyspace setting uses the replication factor of a DC keyspace to calculate the token allocation when a node is added to the cluster for the first time. Nodes need to be

Re: Should we use Materialised Views or ditch them ?

2020-03-01 Thread Anthony Grasso
Hi Tobias, I have had a similar experiences to Jon where I have seen Materialized Views cause major issues in clusters. I too recommend avoiding them. Regards, Anthony On Sat, 29 Feb 2020 at 07:37, Jon Haddad wrote: > I also recommend avoiding them. I've seen too many clusters fall over as >

Re: [EXTERNAL] Cassandra 3.11.X upgrades

2020-03-03 Thread Anthony Grasso
Manish is correct. Upgrade the Cassandra version of a single node only. If that node is behaving as expected (i.e. is in an Up/Normal state and no errors in the logs), then upgrade the Cassandra version for each node one at a time. Be sure to check that each node is running as expected. Once the

Re: Uneven token distribution with allocate_tokens_for_keyspace

2020-01-27 Thread Anthony Grasso
stand this part of the process. Why do tokens conflict if the > nodes owning them are in a different datacenter ? > > Regards, > > Leo > > On Thu, Dec 5, 2019 at 1:00 AM Anthony Grasso > wrote: > >> Hi Enrico, >> >> Glad to hear the problem has been resolved and t

Re: [EXTERNAL] How to reduce vnodes without downtime

2020-01-30 Thread Anthony Grasso
Hi Maxim, Basically what Sean suggested is the way to do this without downtime. To clarify the, the *three* steps following the "Decommission each node in the DC you are working on" step should be applied to *only* the decommissioned nodes. So where it say "*all nodes*" or "*every node*" it

Re: [EXTERNAL] How to reduce vnodes without downtime

2020-02-02 Thread Anthony Grasso
>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)? >>> Thanks >>> >>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R < >>> sean_r_dur...@homedepot.com> wrote: >>> >>>> These are good clarificat

Re: Generating evenly distributed tokens for vnodes

2020-05-28 Thread Anthony Grasso
Hi Kornel, Great use of the script for generating initial tokens! I agree that you can achieve an optimal token distribution in a cluster using such a method. One thing to think about is the process for expanding the size of the cluster in this case. For example consider the scenario where you

Re: Migrating Cassandra from 3.11.11 to 4.0.0 vs num_tokens

2021-09-05 Thread Anthony Grasso
Hi Jean, This is a really good question. As Erick mentioned, if you want to change your cluster's *num_tokens* to 16 to match the 4.0 default, you will need to perform a datacenter migration. Feel free to read over this blog post

Re: Log4j vulnerability

2022-01-11 Thread Anthony Grasso
Hi Arvinder, You are correct; tlp-stress includes Log4j as one of its libraries and users will need to update the JAR file. On 16th December 2021, tlp-stress was updated [1] to include Log4j 2.16.0 which fixed CVE-2021-45046. Version 5.0.0 was released which included this change. Unfortunately,