Re: Row cache tuning

2017-03-13 Thread Thomas Julian
Hi Matija,



​Leveraging page cache yields good results and if accounted for can provide you 
with performance increase on read side
​

I would like to leverage the page cache to improve read performance. How this 
can be done?




Best Regards,

Julian.











 On Mon, 13 Mar 2017 03:42:32 +0530 preetika tyagi 
preetikaty...@gmail.com wrote 




I see. Thanks, Arvydas!



In terms of eviction policy in the row cache, does a write operation 
invalidates only the row(s) which are going be modified or the whole partition? 
In older version of Cassandra, I believe the whole partition gets invalidated 
even if only one row is modified. Is that still true for the latest release 
(3.10). I browsed through many online articles and tutorials but cannot find 
information on this.




On Sun, Mar 12, 2017 at 2:25 PM, Arvydas Jonusonis 
arvydas.jonuso...@gmail.com wrote:






You can experiment quite easily without even needing to restart the Cassandra 
service.



The caches (row and key) can be enabled on a table-by-table basis via a schema 
directive. But the cache capacity (which is the one that you referred to in 
your original post, set to 0 in cassandra.yaml) is a global setting and can be 
manipulated via JMX or nodetool (nodetool setcachecapacity).



Arvydas



On Sun, Mar 12, 2017 at 9:46 AM, preetika tyagi preetikaty...@gmail.com 
wrote:

Thanks, Matija! That was insightful.



I don't really have a use case in particular, however, what I'm trying to do is 
to figure out how the Cassandra performance can be leveraged by using different 
caching mechanisms, such as row cache, key cache, partition summary etc. Of 
course, it will also heavily depend on the type of workload but I'm trying to 
gain more understanding of what's available in the Cassandra framework.



Also, I read somewhere that either row cache or key cache can be turned on for 
a specific table, not both. Based on your comment, I guess the combination of 
page cache and key cache is used widely for tuning the performance.



Thanks,

Preetika




On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec matija0...@gmail.com 
wrote:

Hi,



In 99% of use cases Cassandra's row cache is not something you should look 
into. Leveraging page cache yields good results and if accounted for can 
provide you with performance increase on read side.

I'm not a fan of a default row cache implementation and its invalidation 
mechanism on updates so you really need to be careful when and how you use it. 
There isn't much to configuration as there is to your use case. Maybe explain 
what are you trying to solve with row cache and people can get into discussion 
with more context.



Regards,

Matija




On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi preetikaty...@gmail.com 
wrote:

Hi,



I'm new to Cassandra and trying to get a better understanding on how the row 
cache can be tuned to optimize the performance.



I came across think this article: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html



And it suggests not to even touch row cache unless read workload is  95% 
and mostly rely on machine's default cache mechanism which comes with OS.



The default row cache size is 0 in cassandra.yaml file so the row cache won't 
be utilized at all.



Therefore, I'm wondering how exactly I can decide to chose to tweak row cache 
if needed. Are there any good pointers one can provide on this?



Thanks,

Preetika






















How to calculate CPU Utilisation on each node?

2017-01-10 Thread Thomas Julian
Hello,



We are using Cassandra 2.1.13. We are calculating node CPU utilization using 
the below formula,



CPUUsage = CPURate / (AvailableProcessors*100) 



CPURate = (x2-x1)/(t2-t1), 



where x2 and x1 are the values of the attribute ProcessCpuTime at the time t2 
and t1 respectively. 

We retrieve the value of attributes ProcessCpuTime and AvailableProcessors from 
the ObjectName java.lang:type=OperatingSystem using JMX. 



Is it the correct way to calculate the CPU Utilization for a node? 



or 



Are there any other alternatives to calculate the CPU Utilization per node?



We are using a 32 core physical processor on each node and node CPU Utilization 
reaches 100% every now and then. We suspect that should not be the case.



Best Regards,

Julian.












How to throttle up/down compactions without a restart

2016-10-20 Thread Thomas Julian
Hello,





I was going through this presentation and the Slide-55 caught my attention. 



i.e) "Throttled down compactions during high load period, throttled up during 
low load period"



Can we throttle down compactions without a restart? 



If this can be done, what are all the parameters(JMX?) to work with? How to 
implement this for below Compaction Strategies. 

Size Tiered Compaction Strategy.


Leveled Compaction Strategy


Any help is much appreciated.



Best Regards,

Julian.














Optimising the data model for reads

2016-09-29 Thread Thomas Julian
Hello,



I have created a column family for User File Management.


CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));



Sample Entry



(4*003, 3f9**6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})




Queries :



Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"=value and "FILEID" in (value,value,...)



Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where 
"USERID"=value and "FILEID" in (value,value,...) 



This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 



Any help is much appreciated.



Best Regards,

Julian.








How to alter the default value for concurrent_compactors

2016-09-20 Thread Thomas Julian
Hello,



We have commented out "concurrent_compactors" in our Cassandra 2.1.13 
installation. 

We would like to review this setting, as some issues indicate that the default 
configuration may affect read/write performance. 



https://issues.apache.org/jira/browse/CASSANDRA-8787

https://issues.apache.org/jira/browse/CASSANDRA-7139



Where can we see the value set for concurrent_compactors in our setup? Is it 
possible to update this configuration without a restart?



Best Regards,

Julian.








Optimal value for concurrent_reads for a single NVMe Disk

2016-09-20 Thread Thomas Julian
Hello,



We are using Cassandra 2.1.13 with each node having a NVMe disk with the 
configuration of Total Capacity - 1.2TB, Alloted Capacity -  880GB. We would 
like to increase the default value of 32 for the param concurrent_reads. But 
the document says 



"(Default: 32)note For workloads with more data than can fit in memory, the 
bottleneck is reads fetching data from disk. Setting to (16 × number_of_drives) 
allows operations to queue low enough in the stack so that the OS and drives 
can reorder them. The default setting applies to both logical volume managed 
(LVM) and RAID drives."



https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__concurrent_reads



According to this hardware specification, what could be the optimal value that 
can be set for concurrent_reads?



Best Regards,

Julian.
















Re: Guidelines for configuring Thresholds for Cassandra metrics

2016-09-01 Thread Thomas Julian
,



Not all metrics are KPIs and are only useful when researching a specific issue 
or after a use case specific threshold has been set.



The main "canaries" I monitor are:

* Pending compactions (dependent on the compaction strategy chosen but 1000 is 
a sign of severe issues in all cases)

* dropped mutations (more than one I treat as a event to investigate, I believe 
in allowing operational overhead and any evidence of load shedding suggests I 
may not have as much as I thought)

* blocked anything (flush writers, etc..more than one I investigate)

* system hints ( More than 1k I investigate)

* heap usage and gc time vary a lot by use case and collector chosen, I aim for 
below 65% usage as an average with g1, but this again varies by use case a 
great deal. Sometimes I just looks the chart and query patterns and if they 
don't line up I have to do other deeper investigations

* read and write latencies exceeding SLA is also use case dependent. Those that 
have none I tend to push towards p99 with a middle end SSD based system having 
100ms and a spindle based system having 600ms with CL one and assuming a 
"typical" query pattern (again query patterns and CL so vary here)

* cell count and partition size vary greatly by hardware and gc tuning but I 
like to in the absence of all other relevant information like to keep cell 
count for a partition below 100k and size below 100mb. I however have many 
successful use cases running more and I've had some fail well before that. 
Hardware and tuning tradeoff a shift this around a lot.

There is unfortunately as you'll note a lot of nuance and the load out really 
changes what looks right (down to the model of SSDs I have different 
expectations for p99s if it's a model I haven't used before I'll do some 
comparative testing).



The reason so much of this is general and vague is my selection bias. I'm 
brought in when people are complaining about performance or some grand systemic 
crash because they were monitoring nothing. I have little ability to change 
hardware initially so I have to be willing to allow the hardware to do the best 
it can an establish levels where it can no longer keep up with the customers 
goals. This may mean for some use cases 10 pending compactions is an actionable 
event for them, for another customer 100 is. The better approach is to 
establish a baseline for when these metrics start to indicate a serious issue 
is occurring in that particular app. Basically when people notice a problem, 
what did these numbers look like in the minutes, hours and days prior? That's 
the way to establish the levels consistently.



Regards,



Ryan Svihla
















On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas Julian" 
thomasjul...@zoho.com wrote:



Hello,



I am working on setting up a monitoring tool to monitor Cassandra Instances. 
Are there any wikis which specifies optimum value for each Cassandra KPIs?

For instance, I am not sure,

What value of "Memtable Columns Count" can be considered as "Normal". 


What value of the same has to be considered as "Critical".


I knew threshold numbers for few params, for instance any thing more than zero 
for timeouts, pending tasks should be considered as unusual. Also, I am aware 
that most of the statistics' threshold numbers vary in accordance with Hardware 
Specification, Cassandra Environment Setup. But, what I request here is a 
general guideline for configuring thresholds for all the metrics.



If this has been already covered, please point me to that resource. If anyone 
on their own interest collected these things, please share.



Any help is appreciated.



Best Regards,

Julian.