Re: Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Multiple entries with same key ?

2019-06-20 Thread Andy Tolbert
I just configured a 3 node cluster in this way and was able to reproduce
the warning message:

cqlsh> select peer, rpc_address from system.peers;

 peer  | rpc_address
---+-
 127.0.0.3 |   127.0.0.1
 127.0.0.2 |   127.0.0.1

(2 rows)

cqlsh> select rpc_address from system.local;

 rpc_address
-
   127.0.0.1

10:22:40.399 [s0-admin-0] WARN  c.d.o.d.i.c.metadata.DefaultMetadata - [s0]
Unexpected error while refreshing token map, keeping previous version
java.lang.IllegalArgumentException: Multiple entries with same key:
Murmur3Token(-100881582699237014)=/127.0.0.1:9042 and
Murmur3Token(-100881582699237014)=/127.0.0.1:9042
at
com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215)
at
com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209)
at
com.datastax.oss.driver.shaded.guava.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147)
at
com.datastax.oss.driver.shaded.guava.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110)
at
com.datastax.oss.driver.shaded.guava.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:393)
at
com.datastax.oss.driver.internal.core.metadata.token.DefaultTokenMap.buildTokenToPrimaryAndRing(DefaultTokenMap.java:261)
at
com.datastax.oss.driver.internal.core.metadata.token.DefaultTokenMap.build(DefaultTokenMap.java:57)
at
com.datastax.oss.driver.internal.core.metadata.DefaultMetadata.rebuildTokenMap(DefaultMetadata.java:146)
at
com.datastax.oss.driver.internal.core.metadata.DefaultMetadata.withNodes(DefaultMetadata.java:104)
at
com.datastax.oss.driver.internal.core.metadata.InitialNodeListRefresh.compute(InitialNodeListRefresh.java:96)
at
com.datastax.oss.driver.internal.core.metadata.MetadataManager.apply(MetadataManager.java:475)
at
com.datastax.oss.driver.internal.core.metadata.MetadataManager$SingleThreaded.refreshNodes(MetadataManager.java:299)
at
com.datastax.oss.driver.internal.core.metadata.MetadataManager$SingleThreaded.access$1700(MetadataManager.java:265)
at
com.datastax.oss.driver.internal.core.metadata.MetadataManager.lambda$refreshNodes$0(MetadataManager.java:155)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

Interestingly enough, the version 3 of the driver only recognizes 1 node,
where version 4 is able to detect 3 nodes separately.  It's probably not a
scenario that was given a lot of thought since this is a misconfiguration.
Will think about how it should be handled and log tickets in any case, as
would be nice to surface to the user that something isn't right in a more
clear way.

Can you please confirm when you have chance that this is indeed a
configuration issue with rpc_address?  Just to make sure I'm not ignoring a
possible bug ;)

Thanks,
Andy


On Thu, Jun 20, 2019 at 10:20 AM Andy Tolbert 
wrote:

> One thing that strikes me is that the endpoint reported is '127.0.0.1'.
> Is it possible that you have rpc_address set to 127.0.0.1 on each of your
> three nodes in cassandra.yaml?  The driver uses the system.peers table to
> identify nodes in the cluster and associates them by rpc_address.  Can you
> verify this by executing 'select peer, rpc_address from system.peers' to
> see what is being reported as the rpc_address and let me know?
>
> In any case, the driver should probably handle this better, I'll create a
> driver ticket.
>
> Thanks,
> Andy
>
> On Thu, Jun 20, 2019 at 10:03 AM Jeff Jirsa  wrote:
>
>> There’s a reasonable chance this is a bug in the Datastax driver - may
>> want to start there when debugging .
>>
>> It’s also just a warn, and the two entries with the same token are the
>> same endpoint which doesn’t seem concerning to me, but I don’t know the
>> Datastax driver that well
>>
>> On Jun 20, 2019, at 7:40 AM, Котельников Александр 
>> wrote:
>>
>> It appears that no such warning is issued if I connected to Cassandra
>> from a remote server, not locally.
>>
>>
>>
>> *From: *Котельников Александр 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Thursday, 20 June 2019 at 10:46
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Unexpected error while refreshing token map, keeping previous
>> version (IllegalArgumentException: Multiple entries with same key ?
>>
>>
>>
>> Hey!
>>
>>
>>
>> I’ve  just configured a test 3-node Cassandra cluster and run very
>> trivial java test against it.
>>
>>
>>
>> I see the following warning from java-driver on each 

Re: Tombstones not getting purged

2019-06-20 Thread Léo FERLIN SUTTON
Thank you for the information !

On Thu, Jun 20, 2019 at 9:50 AM Alexander Dejanovski 
wrote:

> Léo,
>
> if a major compaction isn't a viable option, you can give a go at
> Instaclustr SSTables tools to target the partitions with the most
> tombstones :
> https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge
>
> It generates a report like this:
>
> Summary:
>
> +-+-+
>
> | | Size|
>
> +-+-+
>
> | Disk|  1.9 GB |
>
> | Reclaim | 11.7 MB |
>
> +-+-+
>
>
> Largest reclaimable partitions:
>
> +--++-+-+
>
> | Key  | Size   | Reclaim | Generations |
>
> +--++-+-+
>
> | 001.2.340862 | 3.2 kB |  3.2 kB | [534, 438, 498] |
>
> | 001.2.946243 | 2.9 kB |  2.8 kB | [534, 434, 384] |
>
> | 001.1.527557 | 2.8 kB |  2.7 kB | [534, 519, 394] |
>
> | 001.2.181797 | 2.6 kB |  2.6 kB | [534, 424, 343] |
>
> | 001.3.475853 | 2.7 kB |28 B |  [524, 462] |
>
> | 001.0.159704 | 2.7 kB |28 B |  [440, 247] |
>
> | 001.1.311372 | 2.6 kB |28 B |  [424, 458] |
>
> | 001.0.756293 | 2.6 kB |28 B |  [428, 358] |
>
> | 001.2.681009 | 2.5 kB |28 B |  [440, 241] |
>
> | 001.2.474773 | 2.5 kB |28 B |  [524, 484] |
>
> | 001.2.974571 | 2.5 kB |28 B |  [386, 517] |
>
> | 001.0.143176 | 2.5 kB |28 B |  [518, 368] |
>
> | 001.1.185198 | 2.5 kB |28 B |  [517, 386] |
>
> | 001.3.503517 | 2.5 kB |28 B |  [426, 346] |
>
> | 001.1.847384 | 2.5 kB |28 B |  [436, 396] |
>
> | 001.0.949269 | 2.5 kB |28 B |  [516, 356] |
>
> | 001.0.756763 | 2.5 kB |28 B |  [440, 249] |
>
> | 001.3.973808 | 2.5 kB |28 B |  [517, 386] |
>
> | 001.0.312718 | 2.4 kB |28 B |  [524, 467] |
>
> | 001.3.632066 | 2.4 kB |28 B |  [432, 377] |
>
> | 001.1.946590 | 2.4 kB |28 B |  [519, 389] |
>
> | 001.1.798591 | 2.4 kB |28 B |  [434, 388] |
>
> | 001.3.953922 | 2.4 kB |28 B |  [432, 375] |
>
> | 001.2.585518 | 2.4 kB |28 B |  [432, 375] |
>
> | 001.3.284942 | 2.4 kB |28 B |  [376, 432] |
>
> +--++-+-+
>
> Once you've identified these partitions you can run a compaction on the
> SSTables that contain them (identified using "nodetool getsstables").
> Note that user defined compactions are only available for STCS.
> Also ic-purge will perform a compaction but without writing to disk
> (should look like a validation compaction), so it is rightfully reported by
> the docs as an "intensive process" (not more than a repair though).
>
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> My bad on date formatting, it should have been : %Y/%m/%d
>> Otherwise the SSTables aren't ordered properly.
>>
>> You have 2 SSTables that claim to cover timestamps from 1940 to 2262,
>> which is weird.
>> Aside from that, you have big overlaps all over the SSTables, so that's
>> probably why your tombstones are sticking around.
>>
>> Your best shot here will be a major compaction of that table, since it
>> doesn't seem so big. Remember to use the --split-output flag on the
>> compaction command to avoid ending up with a single SSTable after that.
>>
>> Cheers,
>>
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
>>  wrote:
>>
>>> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
 Hi Leo,

 The overlapping SSTables are indeed the most probable cause as
 suggested by Jeff.
 Do you know if the tombstone compactions actually triggered? (did the
 SSTables name change?)

>>>
>>> Hello !
>>>
>>> I believe they have changed. I do not remember the sstable name but the
>>> "last modified" has changed recently for these tables.
>>>
>>>
 Could you run the following command to list SSTables and provide us the
 output? It will display both their timestamp ranges along with the
 estimated droppable tombstones ratio.


 for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
 $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
 -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
 grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y
 %H:%M:%S') $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort

>>>
>>> Here is the results :
>>>
>>> ```
>>> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
>>> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db

Re: Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Multiple entries with same key ?

2019-06-20 Thread Andy Tolbert
One thing that strikes me is that the endpoint reported is '127.0.0.1'.  Is
it possible that you have rpc_address set to 127.0.0.1 on each of your
three nodes in cassandra.yaml?  The driver uses the system.peers table to
identify nodes in the cluster and associates them by rpc_address.  Can you
verify this by executing 'select peer, rpc_address from system.peers' to
see what is being reported as the rpc_address and let me know?

In any case, the driver should probably handle this better, I'll create a
driver ticket.

Thanks,
Andy

On Thu, Jun 20, 2019 at 10:03 AM Jeff Jirsa  wrote:

> There’s a reasonable chance this is a bug in the Datastax driver - may
> want to start there when debugging .
>
> It’s also just a warn, and the two entries with the same token are the
> same endpoint which doesn’t seem concerning to me, but I don’t know the
> Datastax driver that well
>
> On Jun 20, 2019, at 7:40 AM, Котельников Александр 
> wrote:
>
> It appears that no such warning is issued if I connected to Cassandra from
> a remote server, not locally.
>
>
>
> *From: *Котельников Александр 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Thursday, 20 June 2019 at 10:46
> *To: *"user@cassandra.apache.org" 
> *Subject: *Unexpected error while refreshing token map, keeping previous
> version (IllegalArgumentException: Multiple entries with same key ?
>
>
>
> Hey!
>
>
>
> I’ve  just configured a test 3-node Cassandra cluster and run very trivial
> java test against it.
>
>
>
> I see the following warning from java-driver on each CqlSession
> initialization:
>
>
>
> 13:54:13.913 [loader-admin-0] WARN  c.d.o.d.i.c.metadata.DefaultMetadata -
> [loader] Unexpected error while refreshing token map, keeping previous
> version (IllegalArgumentException: Multiple entries with same key:
> Murmur3Token(-1060405237057176857)=/127.0.0.1:9042 and
> Murmur3Token(-1060405237057176857)=/127.0.0.1:9042)
>
>
>
> What does It mean? Why?
>
>
>
> Cassandra 3.11.4, driver 4.0.1.
>
>
>
> nodetool status
>
> Datacenter: datacenter1
>
> ===
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address   Load   Tokens   Owns (effective)  Host
> ID   Rack
>
> UN  10.73.66.36   419.36 MiB  256  100.0%
> fafa2737-9024-437b-9a59-c1c037bce244  rack1
>
> UN  10.73.66.100  336.47 MiB  256  100.0%
> d5323ad0-f8cd-42d4-b34d-9afcd002ea47  rack1
>
> UN  10.73.67.196  336.4 MiB  256  100.0%
> 74dffe0c-32a4-4071-8b36-5ada5afa4a7d  rack1
>
>
>
> The issue persists if I reset the cluster, just the token changes its
> value.
>
> Alexander
>
>


Re: Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Multiple entries with same key ?

2019-06-20 Thread Jeff Jirsa
There’s a reasonable chance this is a bug in the Datastax driver - may want to 
start there when debugging .

It’s also just a warn, and the two entries with the same token are the same 
endpoint which doesn’t seem concerning to me, but I don’t know the Datastax 
driver that well

> On Jun 20, 2019, at 7:40 AM, Котельников Александр  
> wrote:
> 
> It appears that no such warning is issued if I connected to Cassandra from a 
> remote server, not locally.
>  
> From: Котельников Александр 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, 20 June 2019 at 10:46
> To: "user@cassandra.apache.org" 
> Subject: Unexpected error while refreshing token map, keeping previous 
> version (IllegalArgumentException: Multiple entries with same key ?
>  
> Hey!
>  
> I’ve  just configured a test 3-node Cassandra cluster and run very trivial 
> java test against it.
>  
> I see the following warning from java-driver on each CqlSession 
> initialization:
>  
> 13:54:13.913 [loader-admin-0] WARN  c.d.o.d.i.c.metadata.DefaultMetadata - 
> [loader] Unexpected error while refreshing token map, keeping previous 
> version (IllegalArgumentException: Multiple entries with same key: 
> Murmur3Token(-1060405237057176857)=/127.0.0.1:9042 and 
> Murmur3Token(-1060405237057176857)=/127.0.0.1:9042)
>  
> What does It mean? Why?
>  
> Cassandra 3.11.4, driver 4.0.1.
>  
> nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID   
> Rack
> UN  10.73.66.36   419.36 MiB  256  100.0%
> fafa2737-9024-437b-9a59-c1c037bce244  rack1
> UN  10.73.66.100  336.47 MiB  256  100.0%
> d5323ad0-f8cd-42d4-b34d-9afcd002ea47  rack1
> UN  10.73.67.196  336.4 MiB  256  100.0%
> 74dffe0c-32a4-4071-8b36-5ada5afa4a7d  rack1
>  
> The issue persists if I reset the cluster, just the token changes its value.
> Alexander


Re: Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Multiple entries with same key ?

2019-06-20 Thread Котельников Александр
It appears that no such warning is issued if I connected to Cassandra from a 
remote server, not locally.

From: Котельников Александр 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, 20 June 2019 at 10:46
To: "user@cassandra.apache.org" 
Subject: Unexpected error while refreshing token map, keeping previous version 
(IllegalArgumentException: Multiple entries with same key ?

Hey!

I’ve  just configured a test 3-node Cassandra cluster and run very trivial java 
test against it.

I see the following warning from java-driver on each CqlSession initialization:

13:54:13.913 [loader-admin-0] WARN  c.d.o.d.i.c.metadata.DefaultMetadata - 
[loader] Unexpected error while refreshing token map, keeping previous version 
(IllegalArgumentException: Multiple entries with same key: 
Murmur3Token(-1060405237057176857)=/127.0.0.1:9042 and 
Murmur3Token(-1060405237057176857)=/127.0.0.1:9042)

What does It mean? Why?

Cassandra 3.11.4, driver 4.0.1.

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.73.66.36   419.36 MiB  256  100.0%
fafa2737-9024-437b-9a59-c1c037bce244  rack1
UN  10.73.66.100  336.47 MiB  256  100.0%
d5323ad0-f8cd-42d4-b34d-9afcd002ea47  rack1
UN  10.73.67.196  336.4 MiB  256  100.0%
74dffe0c-32a4-4071-8b36-5ada5afa4a7d  rack1

The issue persists if I reset the cluster, just the token changes its value.
Alexander


Re: Decommissioned nodes are in UNREACHABLE state

2019-06-20 Thread Alain RODRIGUEZ
Hello,

Assuming you nodes are out for a while and you don't need the data after 60
days (or cannot get it anyway), the way to fix this is to force the node
out. I would try, in this order:

- nodetool removenode HOSTID
- nodetool removenode force

These 2 might really not work at this stage, but if they do, this is a
clean way to do so.
Now, to really push the ghost nodes to the exit door, it often takes:

- nodetool assassinate

I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html):

echo "run -b org.apache.cassandra.net:type=Gossiper
> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199


This should really remove the traces of the node, without any safety, no
streaming, no checks, just get rid of it. So to use with a lot of care and
understanding. In your situation I guess this is what will work.

As a last attempt, you could try removing traces of the dead node(s) from
all the live nodes 'system.peers' table. This table is local to each node,
so the DELETE command is to be send to all the nodes (that have a trace of
an old node).

- cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"

but I see the node IPs in UNREACHABLE state in "nodetool describecluster"
> output. I believe  they appear only for 72 hours, but in my case I see
> those nodes in UNREACHABLE for ever (more than 60 days)


To be more accurate,  you should never see leaving node as unreachable I
believe (not even for 72 hours). The 72 hours is the time Gossip should
continue referencing the old nodes. Typically when you remove the ghost
nodes, they should no longer appear in 'nodetool describe' cluster at all,
 I would say immediately, but still appear in 'nodetool gossipinfo' with a
'left' or 'remove' status.

I hope that helps and that one of the above will do the trick (I'd bet on
the assassinate :)). Also sorry it took us a while to answer you this
relatively common question :);

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> a écrit :

> Hello,
>
> I have a Cassandra cluster running with 2.1.16 version of Cassandra, where
> I have decommissioned few nodes from the cluster using "nodetool
> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
> describecluster" output. I believe  they appear only for 72 hours, but in
> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
> Rolling restart of the nodes didn't remove them. any idea what could be
> causing here?
>
> Note: I don't see them in the nodetool status output.
>


Re: JanusGraph and Cassandra

2019-06-20 Thread Alain RODRIGUEZ
Hello,

This looks more like a JanusGraph question. Also I would rather try in the
support for that tool instead.
I never saw anyone here or elsewhere using JanusGraph, after searching I
only found 4 threads about it here. Thus I think I think even people
knowing Cassandra monitoring/metrics very well will not be able to help you
here.

If you have questions around the metrics as such, we might be able to help
you though :).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mer. 12 juin 2019 à 08:31, Vinayak Bali 
a écrit :

> Hi,
>
> I am using JanusGraph along with Cassandra as the backend. I have created
> a graph using JanusGraphFactory.open() method. I want to create one more
> graph and traverse it. Can you please help me.
>
> Regards,
> Vinayak
>
>


Re: Cassandra Tombstone

2019-06-20 Thread Alain RODRIGUEZ
Hello Aneesh,

Reading your message and answers given, I really think this post I wrote
about 3 years ago now (how quickly time goes through...) about tombstone
might be of interest to you:
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html.
Your problem is not related to tombstone I'd say, but the first part of the
post explains how constancy work in Cassandra, and only then takes the case
of deletes/tombstones. This first part might help you solving your current
issue:
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html#cassandra-some-availability-and-consistency-considerations.
I really tried to reason from scratch, even for people whose new to
Cassandra with a very basic knowledge of internals.

For example you'd read still like:

CL.READ  = Consistency Level (CL) used for reads. Basically the number of
> nodes that will have to acknowledge the read for Cassandra to consider it
> successful.
> CL.WRITE = CL used for writes.
> RF = Replication Factor



CL.READ + CL.WRITE > RF


If what you have is an availability issue, you should just make sure the CL
is lower or equal to the RF and that all nodes are up and responsive.
FWIW, Quorum = RF/2 + 1, thus if RF is 2 and the consistency level for
deletes is quorum (ie "2 / 2 + 1 = 2") , one node down could start breaking
availability (and most probably will) as CL > number of replicas available
for certain partitions.

Reading the rest of the post might be useful while working out the design
of the schema and queries, in particular if you plan to use deletes/TTLs.

I hope that helps,
C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mar. 18 juin 2019 à 09:56, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> a écrit :

> On Tue, Jun 18, 2019 at 8:06 AM ANEESH KUMAR K.M 
> wrote:
>
>>
>> I am using Cassandra cluster with 3 nodes which is hosted on AWS. Also we
>> have NodeJS web Application which is on AWS ELB. Now the issue is that,
>> when I add 2 or more servers (nodeJS) in AWS ELB then the delete queries
>> are not working on Cassandra.
>>
>
> Please provide a more concrete description than "not working".  Do you get
> an error?  Which one?  Does it "not working" silently, i.e. w/o an error,
> but you don't observe the expected effect?  How does the delete query look
> like, what is the effect you expect and what do you observe instead?
>
> --
> Alex
>
>


Re: node re-start delays , busy Deleting mc-txn-compaction/ Adding log file replica

2019-06-20 Thread Alain RODRIGUEZ
Also about your traces, and according to Jeff in another thread:

the incomplete sstable will be deleted during startup (in 3.0 and newer
> there’s a transaction log of each compaction in progress - that gets
> cleaned during the startup process)
>

maybe that's what you are seeing? Again, I'm not really familiar with those
traces. I find traces and debug pretty useless (or even counter-productive)
in 99% of the cases, so I don't use them much.

Le jeu. 20 juin 2019 à 12:25, Alain RODRIGUEZ  a écrit :

> Hello Asad,
>
>
>> I’m on environment with  apache Cassandra 3.11.1 with  java 1.8.0_144.
>
> One Node went OOM and crashed.
>
>
> If I remember well, firsts minor versions of C* 3.11 have memory leaks. It
> seems it was fixed in your version though.
>
> 3.11.1
>
> [...]
>
>  * BTree.Builder memory leak (CASSANDRA-13754)
>
>
> Yet other improvements were made later on:
>
>
>> 3.11.3
>
> [...]
>
>  * Remove BTree.Builder Recycler to reduce memory usage (CASSANDRA-13929)
>
>  * Reduce nodetool GC thread count (CASSANDRA-14475)
>
>
> See: https://github.com/apache/cassandra/blob/cassandra-3.11/CHANGES.txt.
> Before digging more I would upgrade to 3.11.latest (latest = 4 or 5 I
> guess), because early versions of a major Cassandra versions are famous for
> being quite broken, even though this major is a 'bug fix only' branch.
> Also minor versions upgrades are not too risky to go through. I would
> maybe start there if you're not too sure how to dig this.
>
> If it happens again or you don't want to upgrade, it would be interesting
> to know:
> -  if the OOM happens inside the JVM or on native memory (then the OS
> would be the one sending the kill signal). These 2 issues have different
> (and sometime opposite) fixes.
> - What's the host size (especially memory) and how the heap (and maybe
> some off heap structures) are configured (at least what is not default).
> - If you saw errors in the logs and what the 'nodetool tpstats' was
> looking like when the node went down (it might have been dumped in the logs)
>
> I don't know much about those traces nor why Cassandra would take a long
> time. Though they are traces and harder to interpret for me. What does the
> INFO / WARN / ERR look like?
> Maybe opening a lot of SSTables and/or replaying a lot of commit logs,
> given the nature of the restart (post outage)?
> To speed up things, when nodes are not crashing, under normal
> circumstances, use 'nodetool drain' as part of stopping the node, before
> stopping/killing the service/process.
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 18 juin 2019 à 23:43, ZAIDI, ASAD A  a écrit :
>
>>
>>
>> I’m on environment with  apache Cassandra 3.11.1 with  java 1.8.0_144.
>>
>>
>>
>> One Node went OOM and crashed. Re-starting this crashed node is taking
>> long time. Trace level debug log is showing messages like:
>>
>>
>>
>>
>>
>> Debug.log trace excerpt:
>>
>> 
>>
>>
>>
>> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
>> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-CompressionInfo.db
>>
>> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
>> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-Filter.db
>>
>> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
>> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-TOC.txt
>>
>> TRACE [main] 2019-06-18 21:30:43,455 LogTransaction.java:217 - Deleting
>> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc_txn_compaction_642976c0-91c3-11e9-97bb-6b1dee397c3f.log
>>
>> TRACE [main] 2019-06-18 21:30:43,458 LogReplicaSet.java:67 - Added log
>> file replica
>> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc_txn_compaction_5a6c8c90-91cc-11e9-97bb-6b1dee397c3f.log
>>
>>
>>
>>
>>
>> Above messages are repeated for unique [mc--* ] files. Such messages
>> are repeating constantly.
>>
>>
>>
>> I’m seeking help here to find out what may be going on here , any hint to
>> root cause and how I can quickly start the node. Thanks in advance.
>>
>>
>>
>> Regards/asad
>>
>>
>>
>>
>>
>>
>>
>


Re: node re-start delays , busy Deleting mc-txn-compaction/ Adding log file replica

2019-06-20 Thread Alain RODRIGUEZ
Hello Asad,


> I’m on environment with  apache Cassandra 3.11.1 with  java 1.8.0_144.

One Node went OOM and crashed.


If I remember well, firsts minor versions of C* 3.11 have memory leaks. It
seems it was fixed in your version though.

3.11.1

[...]

 * BTree.Builder memory leak (CASSANDRA-13754)


Yet other improvements were made later on:


> 3.11.3

[...]

 * Remove BTree.Builder Recycler to reduce memory usage (CASSANDRA-13929)

 * Reduce nodetool GC thread count (CASSANDRA-14475)


See: https://github.com/apache/cassandra/blob/cassandra-3.11/CHANGES.txt.
Before digging more I would upgrade to 3.11.latest (latest = 4 or 5 I
guess), because early versions of a major Cassandra versions are famous for
being quite broken, even though this major is a 'bug fix only' branch.
Also minor versions upgrades are not too risky to go through. I would maybe
start there if you're not too sure how to dig this.

If it happens again or you don't want to upgrade, it would be interesting
to know:
-  if the OOM happens inside the JVM or on native memory (then the OS would
be the one sending the kill signal). These 2 issues have different (and
sometime opposite) fixes.
- What's the host size (especially memory) and how the heap (and maybe some
off heap structures) are configured (at least what is not default).
- If you saw errors in the logs and what the 'nodetool tpstats' was looking
like when the node went down (it might have been dumped in the logs)

I don't know much about those traces nor why Cassandra would take a long
time. Though they are traces and harder to interpret for me. What does the
INFO / WARN / ERR look like?
Maybe opening a lot of SSTables and/or replaying a lot of commit logs,
given the nature of the restart (post outage)?
To speed up things, when nodes are not crashing, under normal
circumstances, use 'nodetool drain' as part of stopping the node, before
stopping/killing the service/process.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mar. 18 juin 2019 à 23:43, ZAIDI, ASAD A  a écrit :

>
>
> I’m on environment with  apache Cassandra 3.11.1 with  java 1.8.0_144.
>
>
>
> One Node went OOM and crashed. Re-starting this crashed node is taking
> long time. Trace level debug log is showing messages like:
>
>
>
>
>
> Debug.log trace excerpt:
>
> 
>
>
>
> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-CompressionInfo.db
>
> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-Filter.db
>
> TRACE [main] 2019-06-18 21:30:43,449 LogTransaction.java:217 - Deleting
> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc-9337720-big-TOC.txt
>
> TRACE [main] 2019-06-18 21:30:43,455 LogTransaction.java:217 - Deleting
> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc_txn_compaction_642976c0-91c3-11e9-97bb-6b1dee397c3f.log
>
> TRACE [main] 2019-06-18 21:30:43,458 LogReplicaSet.java:67 - Added log
> file replica
> /cassandra/data/enterprise/device_connection_ws-f65649e0aea011e7baeb8166fa28890a/mc_txn_compaction_5a6c8c90-91cc-11e9-97bb-6b1dee397c3f.log
>
>
>
>
>
> Above messages are repeated for unique [mc--* ] files. Such messages
> are repeating constantly.
>
>
>
> I’m seeking help here to find out what may be going on here , any hint to
> root cause and how I can quickly start the node. Thanks in advance.
>
>
>
> Regards/asad
>
>
>
>
>
>
>


Re: How to query TTL on collections ?

2019-06-20 Thread Alain RODRIGUEZ
Hello Maxim.

I think you won't be able to do what you want this way. Collections are
supposed to be (ideally small) sets of data that you'll always read
entirely, at once. At least it seems to be working this way. Not sure about
the latest versions, but I did not hear about new design for collections.

You can set values individually in a collection as you did above (and
probably should do so to avoid massive tombstones creation), but you have
to read the whole thing at once:

```
$ ccm node1 cqlsh -e "SELECT items[10] FROM tlp_labs.products WHERE
product_id=1;"
:1:SyntaxException: line 1:12 no viable alternative at input '['
(SELECT [items][...)

$ ccm node1 cqlsh -e "SELECT items FROM tlp_labs.products WHERE
product_id=1;"
 items

 {10: {csn: 100, name: 'item100'}, 20: {csn: 200, name: 'item200'}}
```

Furthermore, you cannot query the TTL for a single item in a collection,
and as distinct columns can have distinct TTLs, you cannot query the TTL
for the whole map (collection). As you cannot get the TTL for the whole
thing, nor query a single item of the collection, I guess there is no way
to get the currently set TTL for all or part of a collection.

If you need it, you would need to redesign this table, maybe split it. Make
the collection a different table for example, that you would then be
referenced in your current table.
Another idea of hack I'm just thinking about could be to add a 'ttl' field
that would get the updates as well, any time a client updates the TTL for
an entry, you could update that 'ttl' field as well. But again, you would
still not be able to query this information only for an item or a few, it
would be querying the whole map again.

I had to test it because I could not remember about this, and I think my
observations are making sense. Sadly, there is no 'good' syntax for this
query, it's just not permitted at all I would say. Sorry I have no better
news for you :).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mer. 19 juin 2019 à 09:21, Maxim Parkachov  a
écrit :

> Hi everyone,
>
> I'm struggling to understand how can I query TTL on the row in collection
> ( Cassandra 3.11.4 ).
> Here is my schema:
>
> CREATE TYPE item (
>   csn bigint,
>   name text
> );
>
> CREATE TABLE products (
>   product_id bigint PRIMARY KEY,
>   items map>
> );
>
> And I'm creating records with TTL like this:
>
> UPDATE products USING TTL 10 SET items = items + {10: {csn: 100, name:
> 'item100'}} WHERE product_id = 1;
> UPDATE products USING TTL 20 SET items = items + {20: {csn: 200, name:
> 'item200'}} WHERE product_id = 1;
>
> As expected first records disappears after 10 seconds and the second after
> 20. But if I already have data in the table I could not figure out how to
> query TTL on the item value:
>
> SELECT TTL(items) FROM products WHERE product_id=1;
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Cannot use selection function ttl on collections"
>
> SELECT TTL(items[10]) FROM products WHERE product_id=1;
> SyntaxException: line 1:16 mismatched input '[' expecting ')' (SELECT
> TTL(items[[]...)
>
> Any tips, hints, tricks are highly appreciated,
> Maxim.
>


Re: Tombstones not getting purged

2019-06-20 Thread Alexander Dejanovski
Léo,

if a major compaction isn't a viable option, you can give a go at
Instaclustr SSTables tools to target the partitions with the most
tombstones :
https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge

It generates a report like this:

Summary:

+-+-+

| | Size|

+-+-+

| Disk|  1.9 GB |

| Reclaim | 11.7 MB |

+-+-+


Largest reclaimable partitions:

+--++-+-+

| Key  | Size   | Reclaim | Generations |

+--++-+-+

| 001.2.340862 | 3.2 kB |  3.2 kB | [534, 438, 498] |

| 001.2.946243 | 2.9 kB |  2.8 kB | [534, 434, 384] |

| 001.1.527557 | 2.8 kB |  2.7 kB | [534, 519, 394] |

| 001.2.181797 | 2.6 kB |  2.6 kB | [534, 424, 343] |

| 001.3.475853 | 2.7 kB |28 B |  [524, 462] |

| 001.0.159704 | 2.7 kB |28 B |  [440, 247] |

| 001.1.311372 | 2.6 kB |28 B |  [424, 458] |

| 001.0.756293 | 2.6 kB |28 B |  [428, 358] |

| 001.2.681009 | 2.5 kB |28 B |  [440, 241] |

| 001.2.474773 | 2.5 kB |28 B |  [524, 484] |

| 001.2.974571 | 2.5 kB |28 B |  [386, 517] |

| 001.0.143176 | 2.5 kB |28 B |  [518, 368] |

| 001.1.185198 | 2.5 kB |28 B |  [517, 386] |

| 001.3.503517 | 2.5 kB |28 B |  [426, 346] |

| 001.1.847384 | 2.5 kB |28 B |  [436, 396] |

| 001.0.949269 | 2.5 kB |28 B |  [516, 356] |

| 001.0.756763 | 2.5 kB |28 B |  [440, 249] |

| 001.3.973808 | 2.5 kB |28 B |  [517, 386] |

| 001.0.312718 | 2.4 kB |28 B |  [524, 467] |

| 001.3.632066 | 2.4 kB |28 B |  [432, 377] |

| 001.1.946590 | 2.4 kB |28 B |  [519, 389] |

| 001.1.798591 | 2.4 kB |28 B |  [434, 388] |

| 001.3.953922 | 2.4 kB |28 B |  [432, 375] |

| 001.2.585518 | 2.4 kB |28 B |  [432, 375] |

| 001.3.284942 | 2.4 kB |28 B |  [376, 432] |

+--++-+-+

Once you've identified these partitions you can run a compaction on the
SSTables that contain them (identified using "nodetool getsstables").
Note that user defined compactions are only available for STCS.
Also ic-purge will perform a compaction but without writing to disk (should
look like a validation compaction), so it is rightfully reported by the
docs as an "intensive process" (not more than a repair though).

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski 
wrote:

> My bad on date formatting, it should have been : %Y/%m/%d
> Otherwise the SSTables aren't ordered properly.
>
> You have 2 SSTables that claim to cover timestamps from 1940 to 2262,
> which is weird.
> Aside from that, you have big overlaps all over the SSTables, so that's
> probably why your tombstones are sticking around.
>
> Your best shot here will be a major compaction of that table, since it
> doesn't seem so big. Remember to use the --split-output flag on the
> compaction command to avoid ending up with a single SSTable after that.
>
> Cheers,
>
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
>  wrote:
>
>> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi Leo,
>>>
>>> The overlapping SSTables are indeed the most probable cause as suggested
>>> by Jeff.
>>> Do you know if the tombstone compactions actually triggered? (did the
>>> SSTables name change?)
>>>
>>
>> Hello !
>>
>> I believe they have changed. I do not remember the sstable name but the
>> "last modified" has changed recently for these tables.
>>
>>
>>> Could you run the following command to list SSTables and provide us the
>>> output? It will display both their timestamp ranges along with the
>>> estimated droppable tombstones ratio.
>>>
>>>
>>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
>>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
>>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
>>> grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S')
>>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort
>>>
>>
>> Here is the results :
>>
>> ```
>> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
>> 05/01/2019 08:03:24 03/06/2018 

Unexpected error while refreshing token map, keeping previous version (IllegalArgumentException: Multiple entries with same key ?

2019-06-20 Thread Котельников Александр
Hey!

I’ve  just configured a test 3-node Cassandra cluster and run very trivial java 
test against it.

I see the following warning from java-driver on each CqlSession initialization:

13:54:13.913 [loader-admin-0] WARN  c.d.o.d.i.c.metadata.DefaultMetadata - 
[loader] Unexpected error while refreshing token map, keeping previous version 
(IllegalArgumentException: Multiple entries with same key: 
Murmur3Token(-1060405237057176857)=/127.0.0.1:9042 and 
Murmur3Token(-1060405237057176857)=/127.0.0.1:9042)

What does It mean? Why?

Cassandra 3.11.4, driver 4.0.1.

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.73.66.36   419.36 MiB  256  100.0%
fafa2737-9024-437b-9a59-c1c037bce244  rack1
UN  10.73.66.100  336.47 MiB  256  100.0%
d5323ad0-f8cd-42d4-b34d-9afcd002ea47  rack1
UN  10.73.67.196  336.4 MiB  256  100.0%
74dffe0c-32a4-4071-8b36-5ada5afa4a7d  rack1

The issue persists if I reset the cluster, just the token changes its value.
Alexander


Re: Tombstones not getting purged

2019-06-20 Thread Alexander Dejanovski
My bad on date formatting, it should have been : %Y/%m/%d
Otherwise the SSTables aren't ordered properly.

You have 2 SSTables that claim to cover timestamps from 1940 to 2262, which
is weird.
Aside from that, you have big overlaps all over the SSTables, so that's
probably why your tombstones are sticking around.

Your best shot here will be a major compaction of that table, since it
doesn't seem so big. Remember to use the --split-output flag on the
compaction command to avoid ending up with a single SSTable after that.

Cheers,

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
 wrote:

> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Leo,
>>
>> The overlapping SSTables are indeed the most probable cause as suggested
>> by Jeff.
>> Do you know if the tombstone compactions actually triggered? (did the
>> SSTables name change?)
>>
>
> Hello !
>
> I believe they have changed. I do not remember the sstable name but the
> "last modified" has changed recently for these tables.
>
>
>> Could you run the following command to list SSTables and provide us the
>> output? It will display both their timestamp ranges along with the
>> estimated droppable tombstones ratio.
>>
>>
>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
>> grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S')
>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort
>>
>
> Here is the results :
>
> ```
> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
> 05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db
> 05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db
> 05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones:
> 0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55
> md-167943-big-Data.db
> 05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones:
> 0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35
> md-167913-big-Data.db
> 06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51
> md-167917-big-Data.db
> 06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones:
> 0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29
> md-167938-big-Data.db
> 06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones:
> 0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52
> md-167940-big-Data.db
> 06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db
> 06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db
> 06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db
> 06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db
> 07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db
> 09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db
> 11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db
> 12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db
> ```
>
> You could also check the min and max tokens in each SSTable (not sure if
>> you get that info from 3.0 sstablemetadata) so that you can detect the
>> SSTables that overlap on token ranges with the ones that carry the
>> tombstones, and have earlier timestamps. This way you'll be able to trigger
>> manual compactions, targeting those specific SSTables.
>>
>
> I have checked and I don't 

Re: Tombstones not getting purged

2019-06-20 Thread Léo FERLIN SUTTON
On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski 
wrote:

> Hi Leo,
>
> The overlapping SSTables are indeed the most probable cause as suggested
> by Jeff.
> Do you know if the tombstone compactions actually triggered? (did the
> SSTables name change?)
>

Hello !

I believe they have changed. I do not remember the sstable name but the
"last modified" has changed recently for these tables.


> Could you run the following command to list SSTables and provide us the
> output? It will display both their timestamp ranges along with the
> estimated droppable tombstones ratio.
>
>
> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200 $f);
> echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3|
> cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" | grep
> Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(echo
> "$meta" | grep droppable) $(ls -lh $f); done | sort
>

Here is the results :

```
04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db
05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db
05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones:
0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55
md-167943-big-Data.db
05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones:
0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35
md-167913-big-Data.db
06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones:
0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51
md-167917-big-Data.db
06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones:
0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29
md-167938-big-Data.db
06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones:
0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52
md-167940-big-Data.db
06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db
06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db
06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db
06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones: 0.0
-rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db
07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db
09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db
11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db
12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db
```

You could also check the min and max tokens in each SSTable (not sure if
> you get that info from 3.0 sstablemetadata) so that you can detect the
> SSTables that overlap on token ranges with the ones that carry the
> tombstones, and have earlier timestamps. This way you'll be able to trigger
> manual compactions, targeting those specific SSTables.
>

I have checked and I don't believe the info is available in the 3.0.X
version of sstablemetadata :(


> The rule for a tombstone to be purged is that there is no SSTable outside
> the compaction that would possibly contain the partition and that would
> have older timestamps.
>
 Is there a way to log these checks and decisions made by the compaction
thread ?


> Is this a followup on your previous issue where you were trying to perform
> a major compaction on an LCS table?
>

In some way.

We are trying to globally reclaim the data used up by our tombstones (on
more than one table). We have recently started to purge old data in our
cassandra cluster, and since (on cloud providers) `Disk space isn't cheap`
we are trying to be sure the data correctly expires and the disk space is
reclaimed !

The major compaction on the LCS table was one of our unsuccessful attempts
(too long and too much disk space