Re: Row cache tuning
Hi Matija, Leveraging page cache yields good results and if accounted for can provide you with performance increase on read side I would like to leverage the page cache to improve read performance. How this can be done? Best Regards, Julian. On Mon, 13 Mar 2017 03:42:32 +0530 preetika tyagi preetikaty...@gmail.com wrote I see. Thanks, Arvydas! In terms of eviction policy in the row cache, does a write operation invalidates only the row(s) which are going be modified or the whole partition? In older version of Cassandra, I believe the whole partition gets invalidated even if only one row is modified. Is that still true for the latest release (3.10). I browsed through many online articles and tutorials but cannot find information on this. On Sun, Mar 12, 2017 at 2:25 PM, Arvydas Jonusonis arvydas.jonuso...@gmail.com wrote: You can experiment quite easily without even needing to restart the Cassandra service. The caches (row and key) can be enabled on a table-by-table basis via a schema directive. But the cache capacity (which is the one that you referred to in your original post, set to 0 in cassandra.yaml) is a global setting and can be manipulated via JMX or nodetool (nodetool setcachecapacity). Arvydas On Sun, Mar 12, 2017 at 9:46 AM, preetika tyagi preetikaty...@gmail.com wrote: Thanks, Matija! That was insightful. I don't really have a use case in particular, however, what I'm trying to do is to figure out how the Cassandra performance can be leveraged by using different caching mechanisms, such as row cache, key cache, partition summary etc. Of course, it will also heavily depend on the type of workload but I'm trying to gain more understanding of what's available in the Cassandra framework. Also, I read somewhere that either row cache or key cache can be turned on for a specific table, not both. Based on your comment, I guess the combination of page cache and key cache is used widely for tuning the performance. Thanks, Preetika On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec matija0...@gmail.com wrote: Hi, In 99% of use cases Cassandra's row cache is not something you should look into. Leveraging page cache yields good results and if accounted for can provide you with performance increase on read side. I'm not a fan of a default row cache implementation and its invalidation mechanism on updates so you really need to be careful when and how you use it. There isn't much to configuration as there is to your use case. Maybe explain what are you trying to solve with row cache and people can get into discussion with more context. Regards, Matija On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi preetikaty...@gmail.com wrote: Hi, I'm new to Cassandra and trying to get a better understanding on how the row cache can be tuned to optimize the performance. I came across think this article: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html And it suggests not to even touch row cache unless read workload is 95% and mostly rely on machine's default cache mechanism which comes with OS. The default row cache size is 0 in cassandra.yaml file so the row cache won't be utilized at all. Therefore, I'm wondering how exactly I can decide to chose to tweak row cache if needed. Are there any good pointers one can provide on this? Thanks, Preetika
How to calculate CPU Utilisation on each node?
Hello, We are using Cassandra 2.1.13. We are calculating node CPU utilization using the below formula, CPUUsage = CPURate / (AvailableProcessors*100) CPURate = (x2-x1)/(t2-t1), where x2 and x1 are the values of the attribute ProcessCpuTime at the time t2 and t1 respectively. We retrieve the value of attributes ProcessCpuTime and AvailableProcessors from the ObjectName java.lang:type=OperatingSystem using JMX. Is it the correct way to calculate the CPU Utilization for a node? or Are there any other alternatives to calculate the CPU Utilization per node? We are using a 32 core physical processor on each node and node CPU Utilization reaches 100% every now and then. We suspect that should not be the case. Best Regards, Julian.
How to throttle up/down compactions without a restart
Hello, I was going through this presentation and the Slide-55 caught my attention. i.e) "Throttled down compactions during high load period, throttled up during low load period" Can we throttle down compactions without a restart? If this can be done, what are all the parameters(JMX?) to work with? How to implement this for below Compaction Strategies. Size Tiered Compaction Strategy. Leveled Compaction Strategy Any help is much appreciated. Best Regards, Julian.
Optimising the data model for reads
Hello, I have created a column family for User File Management. CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY ("USERID","FILEID")); Sample Entry (4*003, 3f9**6a1, null, 2 , [{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] ,{"filename":"untitled","size":1,"kind":-1,"where":""}) Queries : Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" where "USERID"=value and "FILEID" in (value,value,...) Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where "USERID"=value and "FILEID" in (value,value,...) This column family was perfectly working in our lab. I was able to fetch the results for the queries stated at less than 10ms. I deployed this in production(Cassandra 2.1.13), It was working perfectly for a month or two. But now at times the queries are taking 5s to 10s. On analysing further, I found that few users are deleting the files too frequently. This generates too many tombstones. I have set the gc_grace_seconds to the default 10 days and I have chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for read efficiency. Any help is much appreciated. Best Regards, Julian.
How to alter the default value for concurrent_compactors
Hello, We have commented out "concurrent_compactors" in our Cassandra 2.1.13 installation. We would like to review this setting, as some issues indicate that the default configuration may affect read/write performance. https://issues.apache.org/jira/browse/CASSANDRA-8787 https://issues.apache.org/jira/browse/CASSANDRA-7139 Where can we see the value set for concurrent_compactors in our setup? Is it possible to update this configuration without a restart? Best Regards, Julian.
Optimal value for concurrent_reads for a single NVMe Disk
Hello, We are using Cassandra 2.1.13 with each node having a NVMe disk with the configuration of Total Capacity - 1.2TB, Alloted Capacity - 880GB. We would like to increase the default value of 32 for the param concurrent_reads. But the document says "(Default: 32)note For workloads with more data than can fit in memory, the bottleneck is reads fetching data from disk. Setting to (16 × number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can reorder them. The default setting applies to both logical volume managed (LVM) and RAID drives." https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__concurrent_reads According to this hardware specification, what could be the optimal value that can be set for concurrent_reads? Best Regards, Julian.
Re: Guidelines for configuring Thresholds for Cassandra metrics
, Not all metrics are KPIs and are only useful when researching a specific issue or after a use case specific threshold has been set. The main "canaries" I monitor are: * Pending compactions (dependent on the compaction strategy chosen but 1000 is a sign of severe issues in all cases) * dropped mutations (more than one I treat as a event to investigate, I believe in allowing operational overhead and any evidence of load shedding suggests I may not have as much as I thought) * blocked anything (flush writers, etc..more than one I investigate) * system hints ( More than 1k I investigate) * heap usage and gc time vary a lot by use case and collector chosen, I aim for below 65% usage as an average with g1, but this again varies by use case a great deal. Sometimes I just looks the chart and query patterns and if they don't line up I have to do other deeper investigations * read and write latencies exceeding SLA is also use case dependent. Those that have none I tend to push towards p99 with a middle end SSD based system having 100ms and a spindle based system having 600ms with CL one and assuming a "typical" query pattern (again query patterns and CL so vary here) * cell count and partition size vary greatly by hardware and gc tuning but I like to in the absence of all other relevant information like to keep cell count for a partition below 100k and size below 100mb. I however have many successful use cases running more and I've had some fail well before that. Hardware and tuning tradeoff a shift this around a lot. There is unfortunately as you'll note a lot of nuance and the load out really changes what looks right (down to the model of SSDs I have different expectations for p99s if it's a model I haven't used before I'll do some comparative testing). The reason so much of this is general and vague is my selection bias. I'm brought in when people are complaining about performance or some grand systemic crash because they were monitoring nothing. I have little ability to change hardware initially so I have to be willing to allow the hardware to do the best it can an establish levels where it can no longer keep up with the customers goals. This may mean for some use cases 10 pending compactions is an actionable event for them, for another customer 100 is. The better approach is to establish a baseline for when these metrics start to indicate a serious issue is occurring in that particular app. Basically when people notice a problem, what did these numbers look like in the minutes, hours and days prior? That's the way to establish the levels consistently. Regards, Ryan Svihla On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas Julian" thomasjul...@zoho.com wrote: Hello, I am working on setting up a monitoring tool to monitor Cassandra Instances. Are there any wikis which specifies optimum value for each Cassandra KPIs? For instance, I am not sure, What value of "Memtable Columns Count" can be considered as "Normal". What value of the same has to be considered as "Critical". I knew threshold numbers for few params, for instance any thing more than zero for timeouts, pending tasks should be considered as unusual. Also, I am aware that most of the statistics' threshold numbers vary in accordance with Hardware Specification, Cassandra Environment Setup. But, what I request here is a general guideline for configuring thresholds for all the metrics. If this has been already covered, please point me to that resource. If anyone on their own interest collected these things, please share. Any help is appreciated. Best Regards, Julian.