RE: TWCS sstables not dropping even though all data is expired

2019-05-03 Thread Nick Hatfield
Hi Mike,

If you will, share your compaction settings. More than likely, your issue is 
from 1 of 2 reasons:
1. You have read repair chance set to anything other than 0
2. You’re running repairs on the TWCS CF

Or both….

From: Mike Torra [mailto:mto...@salesforce.com.INVALID]
Sent: Friday, May 03, 2019 3:00 PM
To: user@cassandra.apache.org
Subject: Re: TWCS sstables not dropping even though all data is expired

Thx for the help Paul - there are definitely some details here I still don't 
fully understand, but this helped me resolve the problem and know what to look 
for in the future :)

On Fri, May 3, 2019 at 12:44 PM Paul Chandler 
mailto:p...@redshots.com>> wrote:
Hi Mike,

For TWCS the sstable can only be deleted when all the data has expired in that 
sstable, but you had a record without a ttl in it, so that sstable could never 
be deleted.

That bit is straight forward, the next bit I remember reading somewhere but 
can’t find it at the moment to confirm my thinking.

An sstable can only be deleted if it is the earliest sstable. I think this is 
due to the fact that deleting later sstables may expose old versions of the 
data stored in the stuck sstable which had been superseded. For example, if 
there was a tombstone in a later sstable for the non TTLed record causing the 
problem in this instance. Then deleting that sstable would cause that deleted 
data to reappear. (Someone please correct me if I have this wrong)

Because sstables in different time buckets are never compacted together, this 
problem only goes away when you did the major compaction.

This would happen on all replicas of the data, hence the reason you this 
problem on 3 nodes.

Thanks

Paul
www.redshots.com


On 3 May 2019, at 15:35, Mike Torra 
mailto:mto...@salesforce.com.INVALID>> wrote:

This does indeed seem to be a problem of overlapping sstables, but I don't 
understand why the data (and number of sstables) just continues to grow 
indefinitely. I also don't understand why this problem is only appearing on 
some nodes. Is it just a coincidence that the one rogue test row without a ttl 
is at the 'root' sstable causing the problem (ie, from the output of 
`sstableexpiredblockers`)?

Running a full compaction via `nodetool compact` reclaims the disk space, but 
I'd like to figure out why this happened and prevent it. Understanding why this 
problem would be isolated the way it is (ie only one CF even though I have a 
few others that share a very similar schema, and only some nodes) seems like it 
will help me prevent it.


On Thu, May 2, 2019 at 1:00 PM Paul Chandler 
mailto:p...@redshots.com>> wrote:
Hi Mike,

It sounds like that record may have been deleted, if that is the case then it 
would still be shown in this sstable, but the deleted tombstone record would be 
in a later sstable. You can use nodetool getsstables to work out which sstables 
contain the data.

I recommend reading The Last Pickle post on this: 
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections towards 
the bottom of this post may well explain why the sstable is not being deleted.

Thanks

Paul
www.redshots.com


On 2 May 2019, at 16:08, Mike Torra 
mailto:mto...@salesforce.com.INVALID>> wrote:

I'm pretty stumped by this, so here is some more detail if it helps.

Here is what the suspicious partition looks like in the `sstabledump` output 
(some pii etc redacted):
```
{
"partition" : {
  "key" : [ "some_user_id_value", "user_id", "demo-test" ],
  "position" : 210
},
"rows" : [
  {
"type" : "row",
"position" : 1132,
"clustering" : [ "2019-01-22 15:27:45.000Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
"cells" : [
  { "some": "data" }
]
  }
]
  }
```

And here is what every other partition looks like:
```
{
"partition" : {
  "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
  "position" : 1133
},
"rows" : [
  {
"type" : "row",
"position" : 1234,
"clustering" : [ "2019-01-22 17:59:35.547Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
"cells" : [
  { "name" : "activity_data", "deletion_info" : { "local_delete_time" : 
"2019-01-22T17:59:35Z" }
  }
]
  }
]
  }
```

As expected, almost all of the data except this one suspicious partition has a 
ttl and is already expired. But if a partition isn't expired and I see it in 
the sstable, why wouldn't I see it executing a CQL query against the CF? Why 
would this sstable be preventing so many other sstable's from getting cleaned 
up?

On Tue, Apr 30, 2019 at 12:34 PM Mike Torra 
mailto:mto...@salesforce.com>> wrote:
Hello -

I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few months 
ago I started noticing disk usage on 

RE: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Nick Hatfield
Hi Mike,

Have you checked to make sure you’re not a victim of timestamp overlap?

From: Mike Torra [mailto:mto...@salesforce.com.INVALID]
Sent: Thursday, May 02, 2019 11:09 AM
To: user@cassandra.apache.org
Subject: Re: TWCS sstables not dropping even though all data is expired

I'm pretty stumped by this, so here is some more detail if it helps.

Here is what the suspicious partition looks like in the `sstabledump` output 
(some pii etc redacted):
```
{
"partition" : {
  "key" : [ "some_user_id_value", "user_id", "demo-test" ],
  "position" : 210
},
"rows" : [
  {
"type" : "row",
"position" : 1132,
"clustering" : [ "2019-01-22 15:27:45.000Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
"cells" : [
  { "some": "data" }
]
  }
]
  }
```

And here is what every other partition looks like:
```
{
"partition" : {
  "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
  "position" : 1133
},
"rows" : [
  {
"type" : "row",
"position" : 1234,
"clustering" : [ "2019-01-22 17:59:35.547Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
"cells" : [
  { "name" : "activity_data", "deletion_info" : { "local_delete_time" : 
"2019-01-22T17:59:35Z" }
  }
]
  }
]
  }
```

As expected, almost all of the data except this one suspicious partition has a 
ttl and is already expired. But if a partition isn't expired and I see it in 
the sstable, why wouldn't I see it executing a CQL query against the CF? Why 
would this sstable be preventing so many other sstable's from getting cleaned 
up?

On Tue, Apr 30, 2019 at 12:34 PM Mike Torra 
mailto:mto...@salesforce.com>> wrote:
Hello -

I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few months 
ago I started noticing disk usage on some nodes increasing consistently. At 
first I solved the problem by destroying the nodes and rebuilding them, but the 
problem returns.

I did some more investigation recently, and this is what I found:
- I narrowed the problem down to a CF that uses TWCS, by simply looking at disk 
space usage
- in each region, 3 nodes have this problem of growing disk space (matches 
replication factor)
- on each node, I tracked down the problem to a particular SSTable using 
`sstableexpiredblockers`
- in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
like the other rows, and appears to be from someone else on the team testing 
something and forgetting to include a ttl
- all other rows show "expired: true" except this one, hence my suspicion
- when I query for that particular partition key, I get no results
- I tried deleting the row anyways, but that didn't seem to change anything
- I also tried `nodetool scrub`, but that didn't help either

Would this rogue row without a ttl explain the problem? If so, why? If not, 
does anyone have any other ideas? Why does the row show in `sstabledump` but 
not when I query for it?

I appreciate any help or suggestions!

- Mike


RE: Cassandra taking very long to start and server under heavy load

2019-05-02 Thread Nick Hatfield
Just curious but, did you make sure to run the sstable upgrade after you 
completed the move from 2.x to 3.x ?

From: Evgeny Inberg [mailto:evg...@gmail.com]
Sent: Thursday, May 02, 2019 1:31 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra taking very long to start and server under heavy load

Using a sigle data disk.
Also, it is performing mostly heavy read operations according to the metrics 
cillected.
On Wed, 1 May 2019, 20:14 Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Do you have multiple data disks?
Cassandra 6696 changed behavior with multiple data disks to make it safer in 
the situation that one disk fails . It may be copying data to the right places 
on startup, can you see if sstables are being moved on disk?
--
Jeff Jirsa


On May 1, 2019, at 6:04 AM, Evgeny Inberg 
mailto:evg...@gmail.com>> wrote:
I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going trough 
2.1.14.
After the upgrade, noticed that each node is taking about 10-15 minutes to 
start, and server is under a very heavy load.
Did some digging around and got view leads from the debug log.
Messages like:
Keyspace.java:351 - New replication settings for keyspace system_auth - 
invalidating disk boundary caches
CompactionStrategyManager.java:380 - Recreating compaction strategy - disk 
boundaries are out of date for system_auth.roles.

This is repeating for all keyspaces.

Any suggestion to check and what might cause this to happen on every start?

Thanks!e


RE: Assassinate fails

2019-04-04 Thread Nick Hatfield
This will sound a little silly but, have you tried rolling the cluster?

$> nodetool flush; nodetool drain; service cassandra stop
$> ps aux | grep ‘cassandra’

# make sure the process actually dies. If not you may need to kill -9 . 
Check first to see if nodetool can connect first, nodetool gossipinfo. If the 
connection is live and listening on the port, then just try re-running service 
cassandra stop again. Kill -9 as a last resort

$> service cassandra start
$> nodetool netstats | grep ‘NORMAL’  # wait for this to return before moving 
on to the next node.

Restart them all using this method, then run nodetool status again and see if 
it is listed.

Once other thing, I recall you said something about having to terminate a node 
and then replace it. Make sure that whichever node you did the –Dreplace flag 
on, does not still have it set when you start cassandra on it again!

From: Alex [mailto:m...@aca-o.com]
Sent: Thursday, April 04, 2019 4:58 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails


Hi Anthony,

Thanks for your help.

I tried to run multiple times in quick succession but it fails with :

-- StackTrace --
java.lang.RuntimeException: Endpoint still alive: /192.168.1.18 generation 
changed while trying to assassinate it
at 
org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592)

I can see that the generation number for this node increases by 1 every time I 
call nodetool assassinate ; and the command itself waits for 30 seconds before 
assassinating node. When ran multiple times in quick succession, the command 
fails because the generation number has been changed by the previous instance.



In 'nodetool gossipinfo', the node is marked as "LEFT" on every node.

However, in 'nodetool describecluster', this node is marked as "unreacheable" 
on 3 nodes out of 5.



Alex



Le 04.04.2019 00:56, Anthony Grasso a écrit :
Hi Alex,

We wrote a blog post on this topic late last year: 
http://thelastpickle.com/blog/2018/09/18/assassinate.html.

In short, you will need to run the assassinate command on each node 
simultaneously a number of times in quick succession. This will generate a 
number of messages requesting all nodes completely forget there used to be an 
entry within the gossip state for the given IP address.

Regards,
Anthony

On Thu, 4 Apr 2019 at 03:32, Alex mailto:m...@aca-o.com>> wrote:
Same result it seems:
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.1.18
#calling operation unsafeAssassinateEndpoint of mbean
org.apache.cassandra.net:type=Gossiper
#RuntimeMBeanException: java.lang.NullPointerException


There not much more to see in log files :
WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626
Gossiper.java:575 - Assassinating /192.168.1.18<http://192.168.1.18> via gossip
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627
Gossiper.java:585 - Sleeping for 3ms to ensure 
/192.168.1.18<http://192.168.1.18> does
not change
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628
Gossiper.java:1029 - InetAddress /192.168.1.18<http://192.168.1.18> is now DOWN
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631
StorageService.java:2324 - Removing tokens [..] for 
/192.168.1.18<http://192.168.1.18>




Le 03.04.2019 17:10, Nick Hatfield a écrit :
> Run assassinate the old way. I works very well...
>
> wget -q -O jmxterm.jar
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
>
> java -jar ./jmxterm.jar
>
> $>open localhost:7199
>
> $>bean org.apache.cassandra.net:type=Gossiper
>
> $>run unsafeAssassinateEndpoint 192.168.1.18
>
> $>quit
>
>
> Happy deleting
>
> -Original Message-
> From: Alex [mailto:m...@aca-o.com<mailto:m...@aca-o.com>]
> Sent: Wednesday, April 03, 2019 10:42 AM
> To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
> Subject: Assassinate fails
>
> Hello,
>
> Short story:
> - I had to replace a dead node in my cluster
> - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
> - dead node has null host_id
> - assassinate on dead node fails with error
>
> How can I get rid of this dead node ?
>
>
> Long story:
> I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
> a new node from scratch and "replaced" the dead node using the
> information from this page
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
> It looked like the replacement went ok.
>
> I added two more nodes to strengthen the cluster.
>
> A few 

RE: Assassinate fails

2019-04-03 Thread Nick Hatfield
Run assassinate the old way. I works very well...

wget -q -O jmxterm.jar 
http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar

java -jar ./jmxterm.jar

$>open localhost:7199

$>bean org.apache.cassandra.net:type=Gossiper

$>run unsafeAssassinateEndpoint 192.168.1.18

$>quit


Happy deleting

-Original Message-
From: Alex [mailto:m...@aca-o.com] 
Sent: Wednesday, April 03, 2019 10:42 AM
To: user@cassandra.apache.org
Subject: Assassinate fails

Hello,

Short story:
- I had to replace a dead node in my cluster
- 1 week after, dead node is still seen as DN by 3 out of 5 nodes
- dead node has null host_id
- assassinate on dead node fails with error

How can I get rid of this dead node ?


Long story:
I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built a new 
node from scratch and "replaced" the dead node using the information from this 
page 
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
 
It looked like the replacement went ok.

I added two more nodes to strengthen the cluster.

A few days have passed and the dead node is still visible and marked as "down" 
on 3 of 5 nodes in nodetool status:

--  Address   Load   Tokens   Owns (effective)  Host ID  
  Rack
UN  192.168.1.9   16 GiB 256  35.0% 
76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
UN  192.168.1.12  16.09 GiB  256  34.0% 
719601e2-54a6-440e-a379-c9cf2dc20564  rack1
UN  192.168.1.14  14.16 GiB  256  32.6% 
d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
UN  192.168.1.17  15.4 GiB   256  34.1% 
fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
DN  192.168.1.18  24.3 GiB   256  33.7% null 
  rack1
UN  192.168.1.22  19.06 GiB  256  30.7% 
09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1

Its host ID is null, so I cannot use nodetool removenode. Moreover nodetool 
assassinate 192.168.1.18 fails with :

error: null
-- StackTrace --
java.lang.NullPointerException

And in system.log:

INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
Gossiper.java:585 - Sleeping for 3ms to ensure /192.168.1.18 does not 
change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO  
[IndexSummaryManager:1] 2019-03-27 17:40:03,620
IndexSummaryRedistribution.java:75 - Redistributing index summaries INFO  [RMI 
TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO  [RMI TCP 
Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
StorageService.java:2324 - Removing tokens [-1061369577393671924,...] ERROR 
[GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226 - Exception in 
thread Thread[GossipStage:1,5,main]
java.lang.NullPointerException: null


In system.peers, the dead node shows and has the same ID as the replacing node :

cqlsh> select peer, host_id from system.peers;

  peer | host_id
--+--
  192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
  192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
   192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc
  192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f
  192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564

Dead node and replacing node have different tokens in system.peers.

I should add that I also tried decommission on a node that still
192.168.1.18 in its peers. - it is still marked as "leaving" 5 days later. 
Nothing in notetool netstats or nodetool compactionstats.


Thank you for taking the time to read this. Hope you can help.

Alex

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: TWCS Compactions & Tombstones

2019-03-27 Thread Nick Hatfield
Awesome, thanks again!

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, March 27, 2019 1:36 PM
To: cassandra 
Subject: Re: TWCS Compactions & Tombstones

You would need to swap your class from the com.jeffjirsa variant (probably from 
2.1 / 2.2) to the official TWCS class.

Once that happens I suspect it'll happen quite quickly, but I'm not sure.

On Wed, Mar 27, 2019 at 7:30 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this 
enabled, I guess it will just take time to finally chew through it all?

From: Jeff Jirsa [mailto:jji...@gmail.com<mailto:jji...@gmail.com>]
Sent: Tuesday, March 26, 2019 9:41 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: TWCS Compactions & Tombstones


Or Upgrade to a version with 
https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature

--
Jeff Jirsa


On Mar 26, 2019, at 6:23 PM, Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little 
chance that it has any new data, you can just remove the SStables. You can do a 
rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.si...@gmail.com<mailto:rahul.xavier.si...@gmail.com>

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


RE: TWCS Compactions & Tombstones

2019-03-27 Thread Nick Hatfield
Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this 
enabled, I guess it will just take time to finally chew through it all?

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Tuesday, March 26, 2019 9:41 PM
To: user@cassandra.apache.org
Subject: Re: TWCS Compactions & Tombstones


Or Upgrade to a version with 
https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature

--
Jeff Jirsa


On Mar 26, 2019, at 6:23 PM, Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little 
chance that it has any new data, you can just remove the SStables. You can do a 
rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.si...@gmail.com<mailto:rahul.xavier.si...@gmail.com>

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


RE: TWCS Compactions & Tombstones

2019-03-26 Thread Nick Hatfield
Thanks for the insight, Rahul. We’re using 1 day for the time window.

compaction = {'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy',
  'compaction_window_size': '1',
  'compaction_window_unit': 'DAYS',
  'max_threshold': '32',
  'min_threshold': '4',
  'timestamp_resolution': 'MILLISECONDS',
  'tombstone_compaction_interval': '86400',
  'tombstone_threshold': '0.2',
  'unchecked_tombstone_compaction': 'true'}'

  AND
default_time_to_live = 7884009
  AND
gc_grace_seconds = 86400
  AND
read_repair_chance = 0


Whats the best way to examine the sstable data so that I can verify that it is 
old data, other than by the min / max timestamps?

Thanks for your help

From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com]
Sent: Tuesday, March 26, 2019 9:24 PM
To: user 
Subject: Re: TWCS Compactions & Tombstones

What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little 
chance that it has any new data, you can just remove the SStables. You can do a 
rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.si...@gmail.com<mailto:rahul.xavier.si...@gmail.com>

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


TWCS Compactions & Tombstones

2019-03-26 Thread Nick Hatfield
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


Re: Merging two cluster's in to one without any downtime

2019-03-25 Thread Nick Hatfield
Maybe others will have a different or better solution but, in my experience to 
accomplish HA we simply y write from our application to the new cluster. You 
then export the data from the old cluster using cql2json or any method you 
choose, to the new cluster. That will cover all live(now) data via y write, 
while supplying the old data from the copy you run. Once complete, set up a 
single reader that reads data from the new cluster and verify all is as 
expected!

Sent from my BlackBerry 10 smartphone on the Verizon Wireless 4G LTE network.
From: Nandakishore Tokala
Sent: Monday, March 25, 2019 18:39
To: user@cassandra.apache.org
Reply To: user@cassandra.apache.org
Subject: Merging two cluster's in to one without any downtime


Please let me know the best practices to combine 2 different cluster's into one 
without having any downtime.

Thanks & Regards,
Nanda Kishore


TWCS and tombstone purging

2019-03-15 Thread Nick Hatfield
Hey guys,

Can someone give me some idea or link some good material for determining a good 
/ aggressive tombstone strategy? I want to make sure my tombstones are getting 
purged as soon as possible to reclaim disk.

Thanks


RE: To Repair or Not to Repair

2019-03-14 Thread Nick Hatfield
Beautiful, thank you very much!

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, March 14, 2019 4:55 PM
To: user 
Subject: Re: To Repair or Not to Repair

My coworker Alex (from The Last Pickle) wrote an in depth blog post on TWCS.  
We recommend not running repair on tables that use TWCS.

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

It's enough of a problem that we added a feature into Reaper to auto-blacklist 
TWCS / DTCS tables from being repaired, we wrote about it here: 
http://thelastpickle.com/blog/2019/02/15/reaper-1_4-released.html

Hope this helps!
Jon

On Fri, Mar 15, 2019 at 9:48 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
It seems that running a repair works really well, quickly and efficiently when 
repairing a column family that does not use TWCS. Has anyone else had a similar 
experience? Wondering if running TWCS is doing more harm than good as it chews 
up a lot of cpu and for extended periods of time in comparison to CF’s with a 
compaction strategy of STCS


Thanks,


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


To Repair or Not to Repair

2019-03-14 Thread Nick Hatfield
It seems that running a repair works really well, quickly and efficiently when 
repairing a column family that does not use TWCS. Has anyone else had a similar 
experience? Wondering if running TWCS is doing more harm than good as it chews 
up a lot of cpu and for extended periods of time in comparison to CF's with a 
compaction strategy of STCS


Thanks,


Re: Default TTL on CF

2019-03-14 Thread Nick Hatfield
Awesome! Thank you!

On 3/14/19, 9:29 AM, "Jeff Jirsa"  wrote:

>SSTableReader and CQLSSTableWriter if you’re comfortable with Java
>
>
>-- 
>Jeff Jirsa
>
>
>> On Mar 14, 2019, at 1:28 PM, Nick Hatfield 
>>wrote:
>> 
>> Bummer but, reasonable. Any cool tricks I could use to make that process
>> easier? I have many TB of data on a live cluster and was hoping to
>> starting cleaning out the earlier bad habits of data housekeeping
>> 
>>> On 3/14/19, 9:24 AM, "Jeff Jirsa"  wrote:
>>> 
>>> It does not impact existing data
>>> 
>>> The data gets an expiration time stamp when you write it. Changing the
>>> default only impacts newly written data
>>> 
>>> If you need to change the expiration time on existing data, you must
>>> update it
>>> 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Mar 14, 2019, at 1:16 PM, Nick Hatfield
>>>>
>>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> Can anyone tell me if setting a default TTL will affect existing data?
>>>> I would like to enable a default TTL and have cassandra add that TTL
>>>>to
>>>> any rows that don¹t currently have a TTL set.
>>>> 
>>>> Thanks,
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>>> 
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>
>-
>To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Default TTL on CF

2019-03-14 Thread Nick Hatfield
Bummer but, reasonable. Any cool tricks I could use to make that process
easier? I have many TB of data on a live cluster and was hoping to
starting cleaning out the earlier bad habits of data housekeeping

On 3/14/19, 9:24 AM, "Jeff Jirsa"  wrote:

>It does not impact existing data
>
>The data gets an expiration time stamp when you write it. Changing the
>default only impacts newly written data
>
>If you need to change the expiration time on existing data, you must
>update it
>
>
>-- 
>Jeff Jirsa
>
>
>> On Mar 14, 2019, at 1:16 PM, Nick Hatfield 
>>wrote:
>> 
>> Hello,
>> 
>> Can anyone tell me if setting a default TTL will affect existing data?
>>I would like to enable a default TTL and have cassandra add that TTL to
>>any rows that don¹t currently have a TTL set.
>> 
>> Thanks,
>
>-
>To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>For additional commands, e-mail: user-h...@cassandra.apache.org
>
>



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Default TTL on CF

2019-03-14 Thread Nick Hatfield
Hello,

Can anyone tell me if setting a default TTL will affect existing data? I would 
like to enable a default TTL and have cassandra add that TTL to any rows that 
don’t currently have a TTL set.

Thanks,


Re: can i...

2019-03-07 Thread Nick Hatfield
-big-Data.db
Max: 12/03/2018 Min: 12/02/2018 Estimated droppable tombstones: 
0.885308722373626112G Mar 5 15:16 mc-231551-big-Data.db
Max: 12/04/2018 Min: 12/03/2018 Estimated droppable tombstones: 
0.880364920515454612G Mar 5 10:06 mc-231309-big-Data.db
Max: 12/05/2018 Min: 12/04/2018 Estimated droppable tombstones: 
0.882480501247063312G Mar 5 10:33 mc-231334-big-Data.db
Max: 12/06/2018 Min: 12/05/2018 Estimated droppable tombstones: 
0.76055525630331674.1G Mar 5 08:12 mc-231253-big-Data.db
Max: 12/07/2018 Min: 12/06/2018 Estimated droppable tombstones: 
0.77487879955196473.9G Mar 5 10:55 mc-231386-big-Data.db
Max: 12/08/2018 Min: 12/07/2018 Estimated droppable tombstones: 
0.79989816020505794.1G Mar 5 08:37 mc-231275-big-Data.db
Max: 12/09/2018 Min: 12/08/2018 Estimated droppable tombstones: 
0.80476620795743164.5G Mar 5 03:35 mc-231043-big-Data.db
Max: 12/10/2018 Min: 12/09/2018 Estimated droppable tombstones: 
0.79870462610734534.8G Mar 4 23:36 mc-230870-big-Data.db
Max: 12/11/2018 Min: 12/10/2018 Estimated droppable tombstones: 
0.83463168502464045.6G Mar 5 13:10 mc-231478-big-Data.db
Max: 12/12/2018 Min: 12/11/2018 Estimated droppable tombstones: 
0.83362161077286086.1G Mar 5 00:06 mc-230888-big-Data.db
Max: 12/13/2018 Min: 12/12/2018 Estimated droppable tombstones: 0.8566337089121 
  7.2G Mar 5 02:46 mc-230993-big-Data.db
Max: 12/14/2018 Min: 12/13/2018 Estimated droppable tombstones: 
0.81376446917687834.7G Mar 5 10:32 mc-231358-big-Data.db
Max: 12/15/2018 Min: 12/14/2018 Estimated droppable tombstones: 
0.81666099375092324.6G Mar 5 13:59 mc-231525-big-Data.db
Max: 12/16/2018 Min: 12/15/2018 Estimated droppable tombstones: 
0.80856040432115274.8G Mar 5 05:00 mc-231110-big-Data.db
Max: 12/17/2018 Min: 12/16/2018 Estimated droppable tombstones: 
0.81240082770061115.0G Mar 4 20:34 mc-230739-big-Data.db
Max: 12/18/2018 Min: 12/17/2018 Estimated droppable tombstones: 
0.81975444529467435.0G Mar 5 12:03 mc-231430-big-Data.db
Max: 12/19/2018 Min: 12/18/2018 Estimated droppable tombstones: 
0.76046841348736945.7G Mar 4 21:08 mc-230768-big-Data.db
Max: 12/20/2018 Min: 12/19/2018 Estimated droppable tombstones: 
0.62767161624315766.8G Mar 4 22:39 mc-230832-big-Data.db
Max: 12/21/2018 Min: 12/20/2018 Estimated droppable tombstones: 
0.62628307965486436.9G Mar 4 21:23 mc-230778-big-Data.db
Max: 12/22/2018 Min: 12/21/2018 Estimated droppable tombstones: 
0.62456782183153546.7G Mar 5 09:22 mc-231304-big-Data.db
Max: 12/23/2018 Min: 12/22/2018 Estimated droppable tombstones: 
0.63399018943391546.7G Mar 5 00:06 mc-230897-big-Data.db
Max: 12/24/2018 Min: 12/23/2018 Estimated droppable tombstones: 
0.64010854891802926.8G Mar 5 00:17 mc-230901-big-Data.db
Max: 12/25/2018 Min: 12/24/2018 Estimated droppable tombstones: 
0.648027924752315 6.9G Mar 4 22:04 mc-230809-big-Data.db
Max: 12/26/2018 Min: 12/25/2018 Estimated droppable tombstones: 
0.64656606965168767.0G Mar 4 23:16 mc-230856-big-Data.db
Max: 12/27/2018 Min: 12/26/2018 Estimated droppable tombstones: 
0.54646764577881025.9G Mar 5 08:46 mc-231285-big-Data.db
Max: 12/28/2018 Min: 12/27/2018 Estimated droppable tombstones: 
0.55563361501056525.8G Mar 5 09:03 mc-231298-big-Data.db
Max: 12/29/2018 Min: 12/28/2018 Estimated droppable tombstones: 
0.58846722378738656.1G Mar 4 20:32 mc-230741-big-Data.db
Max: 12/30/2018 Min: 12/29/2018 Estimated droppable tombstones: 
0.61162079117707546.3G Mar 4 21:52 mc-230801-big-Data.db
Max: 12/31/2018 Min: 12/30/2018 Estimated droppable tombstones: 
0.61564495923846196.6G Mar 5 09:48 mc-231332-big-Data.db



Currently our data on disk is filling up quickly because we are unable to 
successfully evict this data. Is there a way to


  1.

1. Cleanup what is currently taking up so much disk space

  2.

2. Mitigate this entirely in the future


Any help would be greatly appreciated!!

Thanks,

Nick Hatfield

From: Surbhi Gupta mailto:surbhi.gupt...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, March 7, 2019 at 11:50 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: can i...

Send the details

On Thu, Mar 7, 2019 at 8:45 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
Use this email to get some insight on how to fix database issues in our cluster?


can i...

2019-03-07 Thread Nick Hatfield
Use this email to get some insight on how to fix database issues in our cluster?