question on uda/udf

2017-02-18 Thread Kant Kodali
Hi All, Goal: want to create check_duplicate UDA on a blob column Context: I have a partition of 10Million rows with size of 10GB (I know this is bad). I want to check if there are duplicate in a blob column in this partition. The blob column can at most be 256 bytes. Question: can I create

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
This looks fine, 8k read ahead as you mentioned. Doesnt look like an issue of data model as well since reads in this https://cl.ly/2c3Z1u2k0u2I appear balanced. In most possibility, this looks like an issue with new node configuration to me. The fact that you have really less data going out of

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
Just for the record, that's what dstat looks like while CS is starting: root@cas10:~# dstat -lrnv 10 ---load-avg--- --io/total- -net/total- ---procs--- --memory-usage- ---paging-- -dsk/total- ---system-- total-cpu-usage 1m 5m 15m | read writ| recv send|run blk new| used

Re: is there a query to find out the largest partition in a table?

2017-02-18 Thread Kant Kodali
*I did the following. Now I wonder if this is one node or multiple nodes? Does this value really tell me I have a large partition?* nodetool cfhistograms test hello // This reports the max partition size is 10GB nodetool tablestats test.hello // This also reports Compacted partition maximum

is there a query to find out the largest partition in a table?

2017-02-18 Thread Kant Kodali
is there a query to find out the largest partition in a table? Does the query below give me the largest partition? select max(mean_partition_size) from size_estimates ; Thanks, Kant

Re: Logging queries

2017-02-18 Thread Igor Leão
Hi Bhuvan, Thanks a lot! Any idea if something can be done for C* 2.X? Best, Igor 2017-02-18 16:41 GMT-03:00 Bhuvan Rawal : > Hi Igor, > > If you are using java driver, you can log slow queries on client side > using QueryLogger. >

Re: Logging queries

2017-02-18 Thread Matija Gobec
Hi Igor, Your best bet is to wait for our next release of diagnostics for 2.x branch. We are planning it for next week. Best, Matija On Sat, Feb 18, 2017 at 8:58 PM, Igor Leão wrote: > Hi Bhuvan, > Thanks a lot! > >

Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
Im not sure if you can create an index on system_traces keyspace for this use case. If the performance issue that you are trying to troubleshoot is consistent than you can switch on tracing for a while and do dump of system_traces.events table say using COPY into csv. You can do analysis on that

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
This is status of the largest KS of these both nodes: UN 10.23.71.10 437.91 GiB 512 49.1% 2679c3fa-347e-4845-bfc1-c4d0bc906576 RAC1 UN 10.23.71.9 246.99 GiB 256 28.3% 2804ef8a-26c8-4d21-9e12-01e8b6644c2f RAC1 So roughly as expected. 2017-02-17 23:07 GMT+01:00 kurt

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Benjamin, What is the disk read ahead on both nodes? Regards, Bhuvan On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth wrote: > This is status of the largest KS of these both nodes: > UN 10.23.71.10 437.91 GiB 512 49.1% >

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
We are talking about a read IO increase of over 2000% with 512 tokens compared to 256 tokens. 100% increase would be linear which would be perfect. 200% would even okay, taking the RAM/Load ratio for caching into account. But > 20x the read IO is really incredible. The nodes are configured with

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
cat /sys/block/sda/queue/read_ahead_kb => 8 On all CS nodes. Is that what you mean? 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal : > Hi Benjamin, > > What is the disk read ahead on both nodes? > > Regards, > Bhuvan > > On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Ben, If its same on both machines then something else could be the issue. We faced high disk io due to misconfigured read ahead which resulted in high amount of disk io for comparatively insignificant network transfer. Can you post output of blockdev --report for a normal node and 512 token

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
256 tokens: root@cas9:/sys/block/dm-0# blockdev --report RORA SSZ BSZ StartSecSize Device rw 256 512 4096 067108864 /dev/ram0 rw 256 512 4096 067108864 /dev/ram1 rw 256 512 4096 067108864 /dev/ram2 rw

Re: Logging queries

2017-02-18 Thread Igor Leão
Thanks Bhuvan! Matija, I'm looking forward to this new release. Cassandra-diagnostics is just great and this feature will make it awesome. Hope to hear from you soon. 2017-02-18 17:20 GMT-03:00 Bhuvan Rawal : > Im not sure if you can create an index on system_traces

Logging queries

2017-02-18 Thread Igor Leão
Hi there, I'm wondering how to log queries from Cassandra. These queries can be either slow queries or all queries. The only constraint is that I should do this on server side. I tried using `nodetool settraceprobability`, which writes all queries to the keyspace `system_traces`. When I try to

Re: Logging queries

2017-02-18 Thread Bhuvan Rawal
Hi Igor, If you are using java driver, you can log slow queries on client side using QueryLogger. https://docs.datastax.com/en/developer/java-driver/2.1/manual/logging/ Slow Query logger for server was introduced in C* 3.10 version. Details: https://issues.apache.org/jira/browse/CASSANDRA-12403