Inviting comments and opinion

2019-05-02 Thread Devopam Mittra
Hi 'Users' :),
Just wanted to seek your opinion on the approach, should you please spare
some time on it.
https://www.slideshare.net/devopam/cassandra-table-modeling-an-alternate-approach



regards
Dev


Re: Mixing LWT and normal operations for a partition

2019-05-02 Thread Shaurya Gupta
Hi,


*1. The below sequence of commands also does not appear to give an expected
output.*

Since, there is a delete command in the batch and then an LWT update using
IF EXISTS, in the final result row with id = 5 must get deleted.


cassandra@cqlsh> select * from demo.tweets;




 *id* | *body* | *latitude* | *longitude* | *time*
  | *user*

+--+--+---+-+---

  *5* | *old body* |  *32.6448* |  *78.21672* | *2019-01-14
18:30:00.00+* | *user5*


(1 rows)

cassandra@cqlsh>

cassandra@cqlsh> begin batch update demo.tweets SET body='new body' where
id = 5 IF EXISTS; delete from demo.tweets where id = 5 IF EXISTS; apply
batch;


 *[applied]*

---

  *True*


cassandra@cqlsh> select * from demo.tweets;


 *id* | *body* | *latitude* | *longitude* | *time* | *user*

+--+--+---+--+--

  *5* | *new body* | *null* |  *null* | *null* | *null*


(1 rows)

cassandra@cqlsh>



*2. On the contrary below sequence of commands gives the expected output:*


cassandra@cqlsh> insert into  demo.tweets (id, user, body, time, latitude,
longitude) values (5, 'user5', 'old body', '2019-01-15', 32.644800,
78.216721);
  cassandra@cqlsh>

cassandra@cqlsh> select * from demo.tweets;


 *id* | *body* | *latitude* | *longitude* | *time*
  | *user*

+--+--+---+-+---

  *5* | *old body* |  *32.6448* |  *78.21672* | *2019-01-14
18:30:00.00+* | *user5*


(1 rows)

cassandra@cqlsh> delete from demo.tweets where id = 5 IF EXISTS;


 *[applied]*

---

  *True*


cassandra@cqlsh> select * from demo.tweets;


 *id* | *body* | *latitude* | *longitude* | *time* | *user*

+--+--+---+--+--


(0 rows)

cassandra@cqlsh> update demo.tweets SET body='new body' where id = 5 IF
EXISTS;


 *[applied]*

---

 *False*


cassandra@cqlsh> select * from demo.tweets;


 *id* | *body* | *latitude* | *longitude* | *time* | *user*

+--+--+---+--+--


(0 rows)


Thanks

Shaurya




On Fri, May 3, 2019 at 1:02 AM Shaurya Gupta  wrote:

> One suggestion - I think Cassandra community is already having a drive to
> update the documentation. This could be added to CQLSH documentation or
> some other relevant documentation.
>
> On Fri, May 3, 2019 at 12:56 AM Shaurya Gupta 
> wrote:
>
>> Thanks Jeff.
>>
>> On Fri, May 3, 2019 at 12:38 AM Jeff Jirsa  wrote:
>>
>>> No. Don’t mix LWT and normal writes.
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> > On May 2, 2019, at 11:43 AM, Shaurya Gupta 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We are seeing really odd behaviour while try to delete a row which is
>>> simultaneously being updated in a light weight transaction.
>>> > The delete command succeeds and the LWT update fails with timeout
>>> exception but still the next select statement shows that the row still
>>> exists. This occurs ones in many such scenarios.
>>> >
>>> > Is it fine to mix LWT and normal operations for the same partition? Is
>>> it expected to work?
>>> >
>>> > Thanks
>>> > Shaurya
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>
>> --
>> Shaurya Gupta
>>
>>
>>
>
> --
> Shaurya Gupta
>
>
>

-- 
Shaurya Gupta


Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Avinash Mandava
Good catch, misread the detail.

On Thu, May 2, 2019 at 4:56 PM Ben Slater 
wrote:

> Reading more carefully, it could actually be either way: quorum requires
> that a majority of nodes complete and ack the write but still aims to write
> to RF nodes (with the last replicate either written immediately or
> eventually via hints or repairs). So, in the scenario outlined the replica
> may or many not have made its way to the third node by the time the first
> two replicas are lost. If there is a replica on the third node it can be
> recovered to the other two nodes by either rebuild (actually replace) or
> repair.
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> 
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Fri, 3 May 2019 at 09:33, Avinash Mandava 
> wrote:
>
>> In scenario 2 it's lost, if both nodes die and get replaced entirely
>> there's no history anywhere that the write ever happened, as it wouldn't be
>> in commitlog, memtable, or sstable in node 3. Surviving that failure
>> scenario of two nodes with same data simultaneously failing requires upping
>> CL or RF, or spreading across 3 racks, if the situation you're trying to
>> avoid is rack failure (which im guessing it is from the question setup)
>>
>> On Thu, May 2, 2019 at 2:25 PM Ben Slater 
>> wrote:
>>
>>> In scenario 2, if the row has been written to node 3 it will be replaced
>>> on the other nodes via rebuild or repair.
>>>
>>> ---
>>>
>>>
>>> *Ben Slater**Chief Product Officer*
>>>
>>> 
>>>
>>> 
>>> 
>>> 
>>>
>>> Read our latest technical blog posts here
>>> .
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:
>>>
 C*: 2.2.8

 Write CL = LQ

 Kspace RF = 3

 Three racks



 A write gets received by node 1 in rack 1 at above specs. Node 1
 (rack1) & node 2 (rack2)  acknowledge it to the client.



 Within some unit of time, node 1 & 2 die. Either ….

- Scenario 1: C* process death: Row did not make it to sstable (it
is in commit log & was in memtable)
- Scenario 2: Node death: row may be have made to sstable, but
nodes are gone (will have to bootstrap to replace).



 Scenario 1: Row is not lost because once C* is restarted, commit log
 should replay the mutation.



 Scenario 2: row is gone forever? If these two nodes are replaced via
 bootstrapping, will they ever get the row back from node 3 (rack3) if the
 write ever made it there?





 
 Thank you



>>>
>>
>> --
>> www.vorstella.com
>> 408 691 8402
>>
>

-- 
www.vorstella.com
408 691 8402


Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Ben Slater
Reading more carefully, it could actually be either way: quorum requires
that a majority of nodes complete and ack the write but still aims to write
to RF nodes (with the last replicate either written immediately or
eventually via hints or repairs). So, in the scenario outlined the replica
may or many not have made its way to the third node by the time the first
two replicas are lost. If there is a replica on the third node it can be
recovered to the other two nodes by either rebuild (actually replace) or
repair.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 3 May 2019 at 09:33, Avinash Mandava  wrote:

> In scenario 2 it's lost, if both nodes die and get replaced entirely
> there's no history anywhere that the write ever happened, as it wouldn't be
> in commitlog, memtable, or sstable in node 3. Surviving that failure
> scenario of two nodes with same data simultaneously failing requires upping
> CL or RF, or spreading across 3 racks, if the situation you're trying to
> avoid is rack failure (which im guessing it is from the question setup)
>
> On Thu, May 2, 2019 at 2:25 PM Ben Slater 
> wrote:
>
>> In scenario 2, if the row has been written to node 3 it will be replaced
>> on the other nodes via rebuild or repair.
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> 
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:
>>
>>> C*: 2.2.8
>>>
>>> Write CL = LQ
>>>
>>> Kspace RF = 3
>>>
>>> Three racks
>>>
>>>
>>>
>>> A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1)
>>> & node 2 (rack2)  acknowledge it to the client.
>>>
>>>
>>>
>>> Within some unit of time, node 1 & 2 die. Either ….
>>>
>>>- Scenario 1: C* process death: Row did not make it to sstable (it
>>>is in commit log & was in memtable)
>>>- Scenario 2: Node death: row may be have made to sstable, but nodes
>>>are gone (will have to bootstrap to replace).
>>>
>>>
>>>
>>> Scenario 1: Row is not lost because once C* is restarted, commit log
>>> should replay the mutation.
>>>
>>>
>>>
>>> Scenario 2: row is gone forever? If these two nodes are replaced via
>>> bootstrapping, will they ever get the row back from node 3 (rack3) if the
>>> write ever made it there?
>>>
>>>
>>>
>>>
>>>
>>> 
>>> Thank you
>>>
>>>
>>>
>>
>
> --
> www.vorstella.com
> 408 691 8402
>


Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Avinash Mandava
In scenario 2 it's lost, if both nodes die and get replaced entirely
there's no history anywhere that the write ever happened, as it wouldn't be
in commitlog, memtable, or sstable in node 3. Surviving that failure
scenario of two nodes with same data simultaneously failing requires upping
CL or RF, or spreading across 3 racks, if the situation you're trying to
avoid is rack failure (which im guessing it is from the question setup)

On Thu, May 2, 2019 at 2:25 PM Ben Slater 
wrote:

> In scenario 2, if the row has been written to node 3 it will be replaced
> on the other nodes via rebuild or repair.
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> 
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:
>
>> C*: 2.2.8
>>
>> Write CL = LQ
>>
>> Kspace RF = 3
>>
>> Three racks
>>
>>
>>
>> A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1)
>> & node 2 (rack2)  acknowledge it to the client.
>>
>>
>>
>> Within some unit of time, node 1 & 2 die. Either ….
>>
>>- Scenario 1: C* process death: Row did not make it to sstable (it is
>>in commit log & was in memtable)
>>- Scenario 2: Node death: row may be have made to sstable, but nodes
>>are gone (will have to bootstrap to replace).
>>
>>
>>
>> Scenario 1: Row is not lost because once C* is restarted, commit log
>> should replay the mutation.
>>
>>
>>
>> Scenario 2: row is gone forever? If these two nodes are replaced via
>> bootstrapping, will they ever get the row back from node 3 (rack3) if the
>> write ever made it there?
>>
>>
>>
>>
>>
>> 
>> Thank you
>>
>>
>>
>

-- 
www.vorstella.com
408 691 8402


Re: CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Ben Slater
In scenario 2, if the row has been written to node 3 it will be replaced on
the other nodes via rebuild or repair.

---


*Ben Slater**Chief Product Officer*



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 3 May 2019 at 00:54, Fd Habash  wrote:

> C*: 2.2.8
>
> Write CL = LQ
>
> Kspace RF = 3
>
> Three racks
>
>
>
> A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1) &
> node 2 (rack2)  acknowledge it to the client.
>
>
>
> Within some unit of time, node 1 & 2 die. Either ….
>
>- Scenario 1: C* process death: Row did not make it to sstable (it is
>in commit log & was in memtable)
>- Scenario 2: Node death: row may be have made to sstable, but nodes
>are gone (will have to bootstrap to replace).
>
>
>
> Scenario 1: Row is not lost because once C* is restarted, commit log
> should replay the mutation.
>
>
>
> Scenario 2: row is gone forever? If these two nodes are replaced via
> bootstrapping, will they ever get the row back from node 3 (rack3) if the
> write ever made it there?
>
>
>
>
>
> 
> Thank you
>
>
>


Re: Mixing LWT and normal operations for a partition

2019-05-02 Thread Shaurya Gupta
One suggestion - I think Cassandra community is already having a drive to
update the documentation. This could be added to CQLSH documentation or
some other relevant documentation.

On Fri, May 3, 2019 at 12:56 AM Shaurya Gupta 
wrote:

> Thanks Jeff.
>
> On Fri, May 3, 2019 at 12:38 AM Jeff Jirsa  wrote:
>
>> No. Don’t mix LWT and normal writes.
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On May 2, 2019, at 11:43 AM, Shaurya Gupta 
>> wrote:
>> >
>> > Hi,
>> >
>> > We are seeing really odd behaviour while try to delete a row which is
>> simultaneously being updated in a light weight transaction.
>> > The delete command succeeds and the LWT update fails with timeout
>> exception but still the next select statement shows that the row still
>> exists. This occurs ones in many such scenarios.
>> >
>> > Is it fine to mix LWT and normal operations for the same partition? Is
>> it expected to work?
>> >
>> > Thanks
>> > Shaurya
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
> --
> Shaurya Gupta
>
>
>

-- 
Shaurya Gupta


Re: Mixing LWT and normal operations for a partition

2019-05-02 Thread Shaurya Gupta
Thanks Jeff.

On Fri, May 3, 2019 at 12:38 AM Jeff Jirsa  wrote:

> No. Don’t mix LWT and normal writes.
>
> --
> Jeff Jirsa
>
>
> > On May 2, 2019, at 11:43 AM, Shaurya Gupta 
> wrote:
> >
> > Hi,
> >
> > We are seeing really odd behaviour while try to delete a row which is
> simultaneously being updated in a light weight transaction.
> > The delete command succeeds and the LWT update fails with timeout
> exception but still the next select statement shows that the row still
> exists. This occurs ones in many such scenarios.
> >
> > Is it fine to mix LWT and normal operations for the same partition? Is
> it expected to work?
> >
> > Thanks
> > Shaurya
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 
Shaurya Gupta


Re: Mixing LWT and normal operations for a partition

2019-05-02 Thread Jeff Jirsa
No. Don’t mix LWT and normal writes. 

-- 
Jeff Jirsa


> On May 2, 2019, at 11:43 AM, Shaurya Gupta  wrote:
> 
> Hi,
> 
> We are seeing really odd behaviour while try to delete a row which is 
> simultaneously being updated in a light weight transaction.
> The delete command succeeds and the LWT update fails with timeout exception 
> but still the next select statement shows that the row still exists. This 
> occurs ones in many such scenarios.
> 
> Is it fine to mix LWT and normal operations for the same partition? Is it 
> expected to work?
> 
> Thanks
> Shaurya

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Mixing LWT and normal operations for a partition

2019-05-02 Thread Shaurya Gupta
Hi,

We are seeing really odd behaviour while try to delete a row which is
simultaneously being updated in a light weight transaction.
The delete command succeeds and the LWT update fails with timeout exception
but still the next select statement shows that the row still exists. This
occurs ones in many such scenarios.

Is it fine to mix LWT and normal operations for the same partition? Is it
expected to work?

Thanks
Shaurya


Re: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Paul Chandler
Hi Mike,

It sounds like that record may have been deleted, if that is the case then it 
would still be shown in this sstable, but the deleted tombstone record would be 
in a later sstable. You can use nodetool getsstables to work out which sstables 
contain the data.

I recommend reading The Last Pickle post on this: 
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections towards 
the bottom of this post may well explain why the sstable is not being deleted.

Thanks 

Paul
www.redshots.com

> On 2 May 2019, at 16:08, Mike Torra  wrote:
> 
> I'm pretty stumped by this, so here is some more detail if it helps.
> 
> Here is what the suspicious partition looks like in the `sstabledump` output 
> (some pii etc redacted):
> ```
> {
> "partition" : {
>   "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>   "position" : 210
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 1132,
> "clustering" : [ "2019-01-22 15:27:45.000Z" ],
> "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
> "cells" : [
>   { "some": "data" }
> ]
>   }
> ]
>   }
> ```
> 
> And here is what every other partition looks like:
> ```
> {
> "partition" : {
>   "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>   "position" : 1133
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 1234,
> "clustering" : [ "2019-01-22 17:59:35.547Z" ],
> "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
> 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
> "cells" : [
>   { "name" : "activity_data", "deletion_info" : { "local_delete_time" 
> : "2019-01-22T17:59:35Z" }
>   }
> ]
>   }
> ]
>   }
> ```
> 
> As expected, almost all of the data except this one suspicious partition has 
> a ttl and is already expired. But if a partition isn't expired and I see it 
> in the sstable, why wouldn't I see it executing a CQL query against the CF? 
> Why would this sstable be preventing so many other sstable's from getting 
> cleaned up?
> 
> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra  > wrote:
> Hello -
> 
> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few 
> months ago I started noticing disk usage on some nodes increasing 
> consistently. At first I solved the problem by destroying the nodes and 
> rebuilding them, but the problem returns.
> 
> I did some more investigation recently, and this is what I found:
> - I narrowed the problem down to a CF that uses TWCS, by simply looking at 
> disk space usage
> - in each region, 3 nodes have this problem of growing disk space (matches 
> replication factor)
> - on each node, I tracked down the problem to a particular SSTable using 
> `sstableexpiredblockers`
> - in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
> like the other rows, and appears to be from someone else on the team testing 
> something and forgetting to include a ttl
> - all other rows show "expired: true" except this one, hence my suspicion
> - when I query for that particular partition key, I get no results
> - I tried deleting the row anyways, but that didn't seem to change anything
> - I also tried `nodetool scrub`, but that didn't help either
> 
> Would this rogue row without a ttl explain the problem? If so, why? If not, 
> does anyone have any other ideas? Why does the row show in `sstabledump` but 
> not when I query for it?
> 
> I appreciate any help or suggestions!
> 
> - Mike



RE: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Nick Hatfield
Hi Mike,

Have you checked to make sure you’re not a victim of timestamp overlap?

From: Mike Torra [mailto:mto...@salesforce.com.INVALID]
Sent: Thursday, May 02, 2019 11:09 AM
To: user@cassandra.apache.org
Subject: Re: TWCS sstables not dropping even though all data is expired

I'm pretty stumped by this, so here is some more detail if it helps.

Here is what the suspicious partition looks like in the `sstabledump` output 
(some pii etc redacted):
```
{
"partition" : {
  "key" : [ "some_user_id_value", "user_id", "demo-test" ],
  "position" : 210
},
"rows" : [
  {
"type" : "row",
"position" : 1132,
"clustering" : [ "2019-01-22 15:27:45.000Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
"cells" : [
  { "some": "data" }
]
  }
]
  }
```

And here is what every other partition looks like:
```
{
"partition" : {
  "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
  "position" : 1133
},
"rows" : [
  {
"type" : "row",
"position" : 1234,
"clustering" : [ "2019-01-22 17:59:35.547Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
"cells" : [
  { "name" : "activity_data", "deletion_info" : { "local_delete_time" : 
"2019-01-22T17:59:35Z" }
  }
]
  }
]
  }
```

As expected, almost all of the data except this one suspicious partition has a 
ttl and is already expired. But if a partition isn't expired and I see it in 
the sstable, why wouldn't I see it executing a CQL query against the CF? Why 
would this sstable be preventing so many other sstable's from getting cleaned 
up?

On Tue, Apr 30, 2019 at 12:34 PM Mike Torra 
mailto:mto...@salesforce.com>> wrote:
Hello -

I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few months 
ago I started noticing disk usage on some nodes increasing consistently. At 
first I solved the problem by destroying the nodes and rebuilding them, but the 
problem returns.

I did some more investigation recently, and this is what I found:
- I narrowed the problem down to a CF that uses TWCS, by simply looking at disk 
space usage
- in each region, 3 nodes have this problem of growing disk space (matches 
replication factor)
- on each node, I tracked down the problem to a particular SSTable using 
`sstableexpiredblockers`
- in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
like the other rows, and appears to be from someone else on the team testing 
something and forgetting to include a ttl
- all other rows show "expired: true" except this one, hence my suspicion
- when I query for that particular partition key, I get no results
- I tried deleting the row anyways, but that didn't seem to change anything
- I also tried `nodetool scrub`, but that didn't help either

Would this rogue row without a ttl explain the problem? If so, why? If not, 
does anyone have any other ideas? Why does the row show in `sstabledump` but 
not when I query for it?

I appreciate any help or suggestions!

- Mike


Re: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Mike Torra
I'm pretty stumped by this, so here is some more detail if it helps.

Here is what the suspicious partition looks like in the `sstabledump`
output (some pii etc redacted):
```
{
"partition" : {
  "key" : [ "some_user_id_value", "user_id", "demo-test" ],
  "position" : 210
},
"rows" : [
  {
"type" : "row",
"position" : 1132,
"clustering" : [ "2019-01-22 15:27:45.000Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
"cells" : [
  { "some": "data" }
]
  }
]
  }
```

And here is what every other partition looks like:
```
{
"partition" : {
  "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
  "position" : 1133
},
"rows" : [
  {
"type" : "row",
"position" : 1234,
"clustering" : [ "2019-01-22 17:59:35.547Z" ],
"liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" :
86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
"cells" : [
  { "name" : "activity_data", "deletion_info" : {
"local_delete_time" : "2019-01-22T17:59:35Z" }
  }
]
  }
]
  }
```

As expected, almost all of the data except this one suspicious partition
has a ttl and is already expired. But if a partition isn't expired and I
see it in the sstable, why wouldn't I see it executing a CQL query against
the CF? Why would this sstable be preventing so many other sstable's from
getting cleaned up?

On Tue, Apr 30, 2019 at 12:34 PM Mike Torra  wrote:

> Hello -
>
> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few
> months ago I started noticing disk usage on some nodes increasing
> consistently. At first I solved the problem by destroying the nodes and
> rebuilding them, but the problem returns.
>
> I did some more investigation recently, and this is what I found:
> - I narrowed the problem down to a CF that uses TWCS, by simply looking at
> disk space usage
> - in each region, 3 nodes have this problem of growing disk space (matches
> replication factor)
> - on each node, I tracked down the problem to a particular SSTable using
> `sstableexpiredblockers`
> - in the SSTable, using `sstabledump`, I found a row that does not have a
> ttl like the other rows, and appears to be from someone else on the team
> testing something and forgetting to include a ttl
> - all other rows show "expired: true" except this one, hence my suspicion
> - when I query for that particular partition key, I get no results
> - I tried deleting the row anyways, but that didn't seem to change anything
> - I also tried `nodetool scrub`, but that didn't help either
>
> Would this rogue row without a ttl explain the problem? If so, why? If
> not, does anyone have any other ideas? Why does the row show in
> `sstabledump` but not when I query for it?
>
> I appreciate any help or suggestions!
>
> - Mike
>


Re: Accidentaly removed SSTables of unneeded data

2019-05-02 Thread Nitan Kainth
You can run nodetool refresh and then sstablescrub to see if there is any
corruption.

On Thu, May 2, 2019 at 9:53 AM Simon ELBAZ  wrote:

> Hi,
>
> I am running Cassandra v2.1 on a 3 node cluster.
>
> *# yum list installed | grep cassa*
> *cassandra21.noarch2.1.12-1
> @datastax*
> *cassandra21-tools.noarch  2.1.12-1
> @datastax   *
>
> Unfortunately, I accidentally removed the SSTables (using rm) (older than
> 10 days) of a table on the 3 nodes.
>
> Running 'nodetool repair' on one of the 3 nodes returns error. Whereas, it
> does not on another.
>
> I don't need to recover the lost data but I would like 'nodetool repair'
> not returning an error.
>
> Thanks for any advice.
>
> Simon
>


Re: Accidentaly removed SSTables of unneeded data

2019-05-02 Thread shalom sagges
Hi Simon,

If you haven't did that already, try to drain and restart the node you
deleted the data from.
Then run the repair again.

Regards,

On Thu, May 2, 2019 at 5:53 PM Simon ELBAZ  wrote:

> Hi,
>
> I am running Cassandra v2.1 on a 3 node cluster.
>
> *# yum list installed | grep cassa*
> *cassandra21.noarch2.1.12-1
> @datastax*
> *cassandra21-tools.noarch  2.1.12-1
> @datastax   *
>
> Unfortunately, I accidentally removed the SSTables (using rm) (older than
> 10 days) of a table on the 3 nodes.
>
> Running 'nodetool repair' on one of the 3 nodes returns error. Whereas, it
> does not on another.
>
> I don't need to recover the lost data but I would like 'nodetool repair'
> not returning an error.
>
> Thanks for any advice.
>
> Simon
>


CL=LQ, RF=3: Can a Write be Lost If Two Nodes ACK'ing it Die

2019-05-02 Thread Fd Habash
C*: 2.2.8
Write CL = LQ
Kspace RF = 3
Three racks

A write gets received by node 1 in rack 1 at above specs. Node 1 (rack1) & node 
2 (rack2)  acknowledge it to the client. 

Within some unit of time, node 1 & 2 die. Either ….
- Scenario 1: C* process death: Row did not make it to sstable (it is in commit 
log & was in memtable)
- Scenario 2: Node death: row may be have made to sstable, but nodes are gone 
(will have to bootstrap to replace).

Scenario 1: Row is not lost because once C* is restarted, commit log should 
replay the mutation.

Scenario 2: row is gone forever? If these two nodes are replaced via 
bootstrapping, will they ever get the row back from node 3 (rack3) if the write 
ever made it there?



Thank you



Accidentaly removed SSTables of unneeded data

2019-05-02 Thread Simon ELBAZ

Hi,

I am running Cassandra v2.1 on a 3 node cluster.

/# yum list installed | grep cassa//
//cassandra21.noarch 2.1.12-1 @datastax //
//cassandra21-tools.noarch 2.1.12-1 @datastax /

Unfortunately, I accidentally removed the SSTables (using rm) (older 
than 10 days) of a table on the 3 nodes.


Running 'nodetool repair' on one of the 3 nodes returns error. Whereas, 
it does not on another.


I don't need to recover the lost data but I would like 'nodetool repair' 
not returning an error.


Thanks for any advice.

Simon



Re: Cassandra taking very long to start and server under heavy load

2019-05-02 Thread Evgeny Inberg
Yes, sstable upgraded on each node.

On Thu, 2 May 2019, 13:39 Nick Hatfield  wrote:

> Just curious but, did you make sure to run the sstable upgrade after you
> completed the move from 2.x to 3.x ?
>
>
>
> *From:* Evgeny Inberg [mailto:evg...@gmail.com]
> *Sent:* Thursday, May 02, 2019 1:31 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra taking very long to start and server under heavy
> load
>
>
>
> Using a sigle data disk.
>
> Also, it is performing mostly heavy read operations according to the
> metrics cillected.
>
> On Wed, 1 May 2019, 20:14 Jeff Jirsa  wrote:
>
> Do you have multiple data disks?
>
> Cassandra 6696 changed behavior with multiple data disks to make it safer
> in the situation that one disk fails . It may be copying data to the right
> places on startup, can you see if sstables are being moved on disk?
>
> --
>
> Jeff Jirsa
>
>
>
>
> On May 1, 2019, at 6:04 AM, Evgeny Inberg  wrote:
>
> I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going
> trough 2.1.14.
>
> After the upgrade, noticed that each node is taking about 10-15 minutes to
> start, and server is under a very heavy load.
>
> Did some digging around and got view leads from the debug log.
>
> Messages like:
>
> *Keyspace.java:351 - New replication settings for keyspace system_auth -
> invalidating disk boundary caches *
>
> *CompactionStrategyManager.java:380 - Recreating compaction strategy -
> disk boundaries are out of date for system_auth.roles.*
>
>
>
> This is repeating for all keyspaces.
>
>
>
> Any suggestion to check and what might cause this to happen on every
> start?
>
>
>
> Thanks!e
>
>


RE: Cassandra taking very long to start and server under heavy load

2019-05-02 Thread Nick Hatfield
Just curious but, did you make sure to run the sstable upgrade after you 
completed the move from 2.x to 3.x ?

From: Evgeny Inberg [mailto:evg...@gmail.com]
Sent: Thursday, May 02, 2019 1:31 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra taking very long to start and server under heavy load

Using a sigle data disk.
Also, it is performing mostly heavy read operations according to the metrics 
cillected.
On Wed, 1 May 2019, 20:14 Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Do you have multiple data disks?
Cassandra 6696 changed behavior with multiple data disks to make it safer in 
the situation that one disk fails . It may be copying data to the right places 
on startup, can you see if sstables are being moved on disk?
--
Jeff Jirsa


On May 1, 2019, at 6:04 AM, Evgeny Inberg 
mailto:evg...@gmail.com>> wrote:
I have upgraded a Cassandra cluster from version 2.0.x to 3.11.4 going trough 
2.1.14.
After the upgrade, noticed that each node is taking about 10-15 minutes to 
start, and server is under a very heavy load.
Did some digging around and got view leads from the debug log.
Messages like:
Keyspace.java:351 - New replication settings for keyspace system_auth - 
invalidating disk boundary caches
CompactionStrategyManager.java:380 - Recreating compaction strategy - disk 
boundaries are out of date for system_auth.roles.

This is repeating for all keyspaces.

Any suggestion to check and what might cause this to happen on every start?

Thanks!e