Cassandra trace

2018-10-22 Thread Mun Dega
Hello,

Does anyone know how I can see queries coming when they're as prepared
statements when trace is turned on Cassandra 3.x?

If trace doesn't show, any ideas how I can see these type of queries?


Re: Nodetool info for heap usage

2018-10-22 Thread Anup Shirolkar
Hi,

The nodetool output should be accurate and reliable.

But using nodetool command for monitoring is not a very good idea.
Nodetool has its own resource overhead each time it is invoked.

You should ideally use a standard monitoring tool/method.

Regards,

Anup Shirolkar




On Tue, 23 Oct 2018 at 07:16, Abdul Patel  wrote:

> Hi All,
>
> Is nodetool info information is accurate to monitor memory usage, intially
> with 3.1.0 we had monitoring nodetool infor for heap usage and it never
> reported this information as high,after upgrading to 3.11.2 we started
> getting high usage using nodetool info   later upgraded to 3.11.3 and same
> behaviour.
> Just wanted make sure if monutoring heap memory usage via nodetool info
> correct or its actually memory leak issue in 3.11.2 anf 3.11.3?
>


Nodetool info for heap usage

2018-10-22 Thread Abdul Patel
Hi All,

Is nodetool info information is accurate to monitor memory usage, intially
with 3.1.0 we had monitoring nodetool infor for heap usage and it never
reported this information as high,after upgrading to 3.11.2 we started
getting high usage using nodetool info   later upgraded to 3.11.3 and same
behaviour.
Just wanted make sure if monutoring heap memory usage via nodetool info
correct or its actually memory leak issue in 3.11.2 anf 3.11.3?


Re: Cleanup cluster after expansion?

2018-10-22 Thread Jeff Jirsa
Nodetool will eventually return when it’s done

You can also watch nodetool compactionstats 

-- 
Jeff Jirsa


> On Oct 22, 2018, at 10:53 AM, Ian Spence  wrote:
> 
> Environment: Cassandra 2.2.9, GNU/Linux CentOS 6 + 7. Two DCs, 3 RACs in DC1 
> and 6 in DC2.
> 
> We recently added 16 new nodes to our 38-node cluster (now 54 nodes). What 
> would be the safest and most
> efficient way of running a cleanup operation? I’ve experimented with running 
> cleanup on a single node and
> nodetool just hangs, but that seems to be a known issue.
> 
> Would something like running it on a couple of nodes per day, working through 
> the cluster, work?
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cleanup cluster after expansion?

2018-10-22 Thread Ian Spence
Environment: Cassandra 2.2.9, GNU/Linux CentOS 6 + 7. Two DCs, 3 RACs in DC1 
and 6 in DC2.

We recently added 16 new nodes to our 38-node cluster (now 54 nodes). What 
would be the safest and most
efficient way of running a cleanup operation? I’ve experimented with running 
cleanup on a single node and
nodetool just hangs, but that seems to be a known issue.

Would something like running it on a couple of nodes per day, working through 
the cluster, work?


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

2018-10-22 Thread Naik, Ninad
Thanks Mick.


We're on datastax java driver version: 2.1.10.2 and we aren't using the client 
side timestamps. Anyway, we went ahead and verified that all client machines 
and cassandra machines are in sync with regards to time.


We've also verified that no reads and writes are going to the remote data 
center.


Here's a bit more information:

-Few rows in this column family can grow quite wide (> 100K columns)

-But we keep seeing this behavior most frequently with rows with just 1 or two 
columns . The typical behavior is: Machine A adds a new row and a column. 30-60 
seconds later Machine B tries to read this row. It doesn't find the row. So the 
application retries within 500ms. This time it finds the row.


Thanks.


From: Mick Semb Wever 
Sent: Saturday, October 20, 2018 10:24:53 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

[ This email has been sent from a source external to Epsilon. Please use 
caution when clicking links or opening attachments. ]

> Thanks James. Yeah, we're using the datastax java driver. But we're on 
> version 2.1.10.2. And we are not using the client side timestamps.


Just to check Ninad. If you are using Cassandra-2.1 (native protocol
v3) and the java driver version 3.0 or above, then you would be using
client-side timestamps by default.
https://github.com/datastax/java-driver/tree/3.x/manual/query_timestamps

With client-side timestamps all client servers and all C* nodes must
be kept tightly in-sync, as Elliot said. Monitoring and alerting on
any clock skew on any of these machines is important.

Also worth checking that any local_quorum requests are not
accidentally go to the wrong datacenter.

regards,
Mick

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


The information contained in this e-mail message and any attachments may be 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that any review, dissemination, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by replying to 
this e-mail and delete the message and any attachments from your computer.


RE: TWCS: Repair create new buckets with old data

2018-10-22 Thread Caesar, Maik
Ok, thanks.
My conclusion:

1.  I will set unchecked_tombstone_compaction to true to get old data with 
tombstones removed

2.  I will exclude TWCS tables from repair

Regarding exclude table from repair, is there any easy way to do this?  
Nodetool repaire do not support excludes.

Regards
Maik

From: wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com]
Sent: Freitag, 19. Oktober 2018 03:58
To: user 
Subject: RE: TWCS: Repair create new buckets with old data

> Is the repair not necessary to get data files remove from filesystem ?
The answer is no. IMO, Cassandra will remove sstable files automatically if it 
can make sure the sstable files are 100% of tombstones and safe to do deletion. 
If you use TWCS and you have only insertion and no update, you don't need run 
repair manually.

-Simon

From: Caesar, Maik
Date: 2018-10-18 20:30
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data
Hello Simon,
Is the repair not necessary to get data files remove from filesystem ? My 
assumption was, that only repaired data will removed after TTL is reached.

Regards
Maik

From: wxn...@zjqunshuo.com 
[mailto:wxn...@zjqunshuo.com]
Sent: Mittwoch, 17. Oktober 2018 09:02
To: user mailto:user@cassandra.apache.org>>
Subject: Re: TWCS: Repair create new buckets with old data

Hi Maik,
IMO, when using TWCS, you had better not run repair. The behaviour of TWCS is 
same with STCS for repair when merging sstables and the result is leaving 
sstables spanning multiple time buckets, but maybe I'm wrong. In my use case, I 
don't do repair with table using TWCS.

-Simon

From: Caesar, Maik
Date: 2018-10-16 17:46
To: user@cassandra.apache.org
Subject: TWCS: Repair create new buckets with old data
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. 
The command “nodetool repair” create always new files with old data. This avoid 
the delete of the old data.
The layout of the Table is following:
cqlsh> desc stat.spa

CREATE TABLE stat.spa (
region int,
id int,
date text,
hour int,
zippedjsonstring blob,
PRIMARY KEY ((region, id), date, hour)
) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
'max_threshold': '100', 'min_threshold': '4', 'tombstone_compaction_interval': 
'86460'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Actual the oldest data are from 2017/04/15 and will not remove:

$ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
--date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
'+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | 
cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo "$meta" | grep 
droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls -lh $f | awk 
'{print $5" "$6" "$7" "$8" "$9}'); done | sort
Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.7731048805815162 Repaired at: 1525685601400 42K May 7 19:56 
mc-22922-big-Data.db
Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.9600207684319835 Repaired at: 1525685601400 116M May 7 13:31 
mc-15096-big-Data.db
Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable tombstones: 
1.9090909090909092 Repaired at: 1525685601400 11K May 7 19:56 
mc-22921-big-Data.db
Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable tombstones: 
1.8360655737704918 Repaired at: 1525685601400 21M May 7 19:56 
mc-22919-big-Data.db
Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable tombstones: 
1.8091397849462365 Repaired at: 1525685601400 19M May 7 14:36 
mc-17095-big-Data.db
Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable tombstones: 
1.76536312849162 Repaired at: 1529612605539   9.3M Jun 21 22:31 
mc-25372-big-Data.db
…

After a „nodetool repair“ run, a new big data file is created that include old 
data from 2017/07/31.

Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable tombstones: 
0.08392555471691247 Repaired at: 011G Sep 11 22:02 
mc-39281-big-Data.db
…
Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated 

Wondering how cql3 DISTINCT query is implemented

2018-10-22 Thread Jing Meng
Hi, we built a simple system to migrate live cassandra data to other
databases, mainly by using these queries:

1. SELECT DISTINCT TOKEN(partition_key) FROM table WHERE
TOKEN(partition_key) > current_offset AND TOKEN(partition_key) <=
upper_bound LIMIT token_fetch_size
2. Any cql query that retrieves all rows, given a set of tokens

And we observed that the "SELECT DISTINCT TOKEN" query takes way longer
when the table is wide partitioned (about 200+ rows on average), look like
the underlying operation is not linear.

Is it that the query would scan every rows of every partitions found until
token_fetch_size is met? Or is it due to some low-level operations that are
naturally more time consuming when dealing with wide partitioned data?

Any advice on this question or where to find the concerning code would be
appreciated.