Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Hello kooljava2,

There aren't many books about Cassandra, but one of the most famous is the
"Cassandra: The definitive guide: Distributed Data at Web Scale", by Hewitt
The problem is that as Cassandra evolves very fast, these books get out of
date quickly.
To understand some concepts that exist in many different NoSQL databases
(many of then originary from the Distributed Systems area), there is the
book "NoSQL Distilled", by Martin Fowler.

Unfortunately the documentation is also not the strongest thing about
Cassandra, reason why this group is very important. But everyone can
cooperate on this.

Lucas B. Dias



2018-05-14 13:56 GMT-03:00 kooljava2 :

> Hello,
>
> Thank you Lucas for sharing.  I am still a beginner in Cassandra NoSQL
> world. Are there any other good books related to Performance tuning and
> Architecture overview?
>
> Thank you.
>
> On Monday, 14 May 2018, 07:57:38 GMT-7, Nitan Kainth <
> nitankai...@gmail.com> wrote:
>
>
> Hi Lucas,
>
> I am not able to download. can you share as attachment in email?
>
>
>
> Regards,
> Nitan K.
> Cassandra and Oracle Architect/SME
> Datastax Certified Cassandra expert
> Oracle 10g Certified
>
> On Mon, May 14, 2018 at 9:12 AM, Lucas Benevides <
> lu...@maurobenevides.com.br> wrote:
>
> Dear community,
>
> I want to tell you about my paper published in a conference in March. The
> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
> Case Study"  and it is available (not for free) in http://www.scitepress.org/
> DigitalLibrary/Link.aspx?doi= 10.5220/0006782702770284
> 
>  .
>
> TWCS is used and compared with DTCS.
>
> I hope you can download it, unfortunately I cannot send copies as the
> publisher has its copyright.
>
> Lucas B. Dias
>
>
>
>


Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Thank you Jeff Jirsa by your comments,

How can we do this:  "fix this by not scheduling the major compaction until
we know all of the sstables in the window are available to be compacted"?

About the column-family schema, I had to customize the cassandra-stress
tool so that it could create a reasonable number of rows per partition. In
the default behavior it keeps creating repeated clustering keys for each
partition, and so most data get updated instead of inserted.

Lucas B. Dias

2018-05-14 14:03 GMT-03:00 Jeff Jirsa :

> Interesting!
>
> I suspect I know what the increased disk usage in TWCS, and it's a
> solvable problem, the problem is roughly something like this:
> - Window 1 has sstables 1, 2, 3, 4, 5, 6
> - We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window)
> - The TWCS window rolls over
> - We flush (sstable 7), and trigger the TWCS window major compaction,
> which starts compacting 5, 6, 7 + any other sstable from that window
> - If the first compaction (1,2,3,4) has finished by the time sstable 7 is
> flushed, we'll include it's result in that compaction, if it doesn't we'll
> have to do the major compaction twice to guarantee we have exactly one
> sstable per window, which will temporarily increase disk space
>
> We can likely fix this by not scheduling the major compaction until we
> know all of the sstables in the window are available to be compacted.
>
> Also your data model is probably typical, but not well suited for time
> series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on
> youtube), I mention aligning partition keys to TWCS windows, which involves
> adding a second component to the partition key. This is hugely important in
> terms of making sure TWCS data expires quickly and avoiding having to read
> from more than one TWCS window at a time.
>
>
> - Jeff
>
>
>
> On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides <
> lu...@maurobenevides.com.br> wrote:
>
>> Dear community,
>>
>> I want to tell you about my paper published in a conference in March. The
>> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
>> Case Study"  and it is available (not for free) in
>> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10
>> .5220/0006782702770284 .
>>
>> TWCS is used and compared with DTCS.
>>
>> I hope you can download it, unfortunately I cannot send copies as the
>> publisher has its copyright.
>>
>> Lucas B. Dias
>>
>>
>>
>


Re: Cassandra upgrade from 2.1 to 3.0

2018-05-14 Thread kooljava2
 We are using datstax java driver  1.5.0
Thank you.

On Saturday, 12 May 2018, 10:37:04 GMT-7, Jeff Jirsa  
wrote:  
 
 I haven't seen this before, but I have a guess.
What client/driver are you using?
Are you using a prepared statement that has every column listed for the update, 
and leaving the un-set columns as null? If so, the null is being translated 
into a delete, which is clearly not what you want.
The differentiation between UNSET and NULL went into 2.2 ( 
https://issues.apache.org/jira/browse/CASSANDRA-7304 ) , and most drivers have 
been updated to know the difference ( https://github.com/gocql/gocql/issues/861 
, https://datastax-oss.atlassian.net/browse/JAVA-777 , etc). I haven't read the 
patch for 7304, but I suspect that maybe there's some sort of mixup along the 
way (maybe in your driver, or maybe you upgraded the driver to support 3.0 and 
picked up a new feature you didnt realize you picked up, etc)

On Fri, May 11, 2018 at 11:26 AM, kooljava2  wrote:

 After further analyzing the data. I see some pattern. The rows which were 
updated in last 2-3 weeks, the column which were not part of this update have 
the null values.  

Has anyone encountered this issue during the upgrade? 


Thank you,

On Thursday, 10 May 2018, 19:49:50 GMT-7, kooljava2 
 wrote:  
 
  Hello Jeff,
2.1.19 to 3.0.15.
Thank you. 

On Thursday, 10 May 2018, 17:43:58 GMT-7, Jeff Jirsa  
wrote:  
 
 Which minor version of 3.0

-- Jeff Jirsa

On May 11, 2018, at 2:54 AM, kooljava2  wrote:



Hello,

Upgraded Cassandra 2.1 to 3.0.  We see certain data in few columns being set to 
"null". These null columns were created during the row creation time.

After looking at the data see a pattern where update was done on these rows. 
Rows which were updated has data but rows which were not part of the update are 
set to null.

 created_on    | created_by  | id
-- ---+-+ 
-- ---
    null |    null |    
12345



sstabledump:- 

WARN  20:47:38,741 Small cdc volume detected at /var/lib/cassandra/cdc_raw; 
setting cdc_total_space_in_mb to 1278.  You can override this in cassandra.yaml
[
  {
    "partition" : {
  "key" : [ "12345" ],
  "position" : 5155159
    },
    "rows" : [
  {
    "type" : "row",
    "position" : 5168738,
    "deletion_info" : { "marked_deleted" : "2018-03-28T20:38:08.05Z", 
"local_delete_time" : "2018-03-28T20:38:08Z" },
    "cells" : [
  { "name" : "doc_type", "value" : false, "tstamp" : 
"2018-03-28T20:38:08.060Z" },
  { "name" : "industry", "deletion_info" : { "local_delete_time" : 
"2018-03-28T20:38:08Z" },
    "tstamp" : "2018-03-28T20:38:08.060Z"
  },
  { "name" : "last_modified_by", "value" : "12345", "tstamp" : 
"2018-03-28T20:38:08.060Z" },
  { "name" : "last_modified_date", "value" : "2018-03-28 
20:38:08.059Z", "tstamp" : "2018-03-28T20:38:08.060Z" },
  { "name" : "locale", "deletion_info" : { "local_delete_time" : 
"2018-03-28T20:38:08Z" },
    "tstamp" : "2018-03-28T20:38:08.060Z"
  },
  { "name" : "postal_code", "deletion_info" : { "local_delete_time" : 
"2018-03-28T20:38:08Z" },
    "tstamp" : "2018-03-28T20:38:08.060Z"
  },
  { "name" : "ticket", "deletion_info" : { "marked_deleted" : 
"2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" } },
  { "name" : "ticket", "path" : [ "TEMP_DATA" ], "value" : 
"{\"name\":\"TEMP_DATA\",\" ticket\":\" a42638dae8350e889f2603be1427ac 
6f5dec5e486d4db164a76bf80820cd f68d635cff5e7d555e6d4eabb9b5b8 
2597b68bec0fcd735fcca\",\" lastRenewedDate\":\"2018-03- 28T20:38:08Z\"}", 
"tstamp" : "2018-03-28T20:38:08.060Z" },
  { "name" : "ticket", "path" : [ "TEMP_TEMP2" ], "value" : 
"{\"name\":\"TEMP_TEMP2\",\" ticket\":\"a4263b7350d1f2683\" 
,\"lastRenewedDate\":\"2018- 03-28T20:38:07Z\"}", "tstamp" : 
"2018-03-28T20:38:08.060Z" },
  { "name" : "ppstatus_pf", "deletion_info" : { "marked_deleted" : 
"2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" } },
  { "name" : "ppstatus_pers", "deletion_info" : { "marked_deleted" : 
"2018-03-28T20:38:08.05Z", "local_delete_time" : "2018-03-28T20:38:08Z" } }
    ]
  }
    ]
  }
]WARN  20:47:41,325 Small cdc volume detected at /var/lib/cassandra/cdc_raw; 
setting cdc_total_space_in_mb to 1278.  You can override this in cassandra.yaml
[
  {
    "partition" : {
  "key" : [ "12345" ],
  "position" : 18743072
    },
    "rows" : [
  {
    "type" : "row",
    "position" : 18751808,
    "liveness_info" : { "tstamp" : "2017-10-25T10:22:41.612Z" },
    "cells" : [
  { "name" : 

Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Jeff Jirsa
Interesting!

I suspect I know what the increased disk usage in TWCS, and it's a solvable
problem, the problem is roughly something like this:
- Window 1 has sstables 1, 2, 3, 4, 5, 6
- We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window)
- The TWCS window rolls over
- We flush (sstable 7), and trigger the TWCS window major compaction, which
starts compacting 5, 6, 7 + any other sstable from that window
- If the first compaction (1,2,3,4) has finished by the time sstable 7 is
flushed, we'll include it's result in that compaction, if it doesn't we'll
have to do the major compaction twice to guarantee we have exactly one
sstable per window, which will temporarily increase disk space

We can likely fix this by not scheduling the major compaction until we know
all of the sstables in the window are available to be compacted.

Also your data model is probably typical, but not well suited for time
series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on
youtube), I mention aligning partition keys to TWCS windows, which involves
adding a second component to the partition key. This is hugely important in
terms of making sure TWCS data expires quickly and avoiding having to read
from more than one TWCS window at a time.


- Jeff



On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides <
lu...@maurobenevides.com.br> wrote:

> Dear community,
>
> I want to tell you about my paper published in a conference in March. The
> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
> Case Study"  and it is available (not for free) in
> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=
> 10.5220/0006782702770284 .
>
> TWCS is used and compared with DTCS.
>
> I hope you can download it, unfortunately I cannot send copies as the
> publisher has its copyright.
>
> Lucas B. Dias
>
>
>


Re: Academic paper about Cassandra database compaction

2018-05-14 Thread kooljava2
 Hello,
Thank you Lucas for sharing.  I am still a beginner in Cassandra NoSQL world. 
Are there any other good books related to Performance tuning and Architecture 
overview?
Thank you.

On Monday, 14 May 2018, 07:57:38 GMT-7, Nitan Kainth 
 wrote:  
 
 Hi Lucas,
I am not able to download. can you share as attachment in email?


Regards,
Nitan K.Cassandra and Oracle Architect/SMEDatastax Certified Cassandra expert
Oracle 10g Certified
On Mon, May 14, 2018 at 9:12 AM, Lucas Benevides  
wrote:

Dear community,
I want to tell you about my paper published in a conference in March. The title 
is "NoSQL Database Performance Tuning for IoT Data - Cassandra Case Study"  and 
it is available (not for free) in http://www.scitepress.org/ 
DigitalLibrary/Link.aspx?doi= 10.5220/0006782702770284 .
TWCS is used and compared with DTCS.
I hope you can download it, unfortunately I cannot send copies as the publisher 
has its copyright.
Lucas B. Dias



  

Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Nitan Kainth
Hi Lucas,

I am not able to download. can you share as attachment in email?



Regards,
Nitan K.
Cassandra and Oracle Architect/SME
Datastax Certified Cassandra expert
Oracle 10g Certified

On Mon, May 14, 2018 at 9:12 AM, Lucas Benevides <
lu...@maurobenevides.com.br> wrote:

> Dear community,
>
> I want to tell you about my paper published in a conference in March. The
> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
> Case Study"  and it is available (not for free) in
> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=
> 10.5220/0006782702770284 .
>
> TWCS is used and compared with DTCS.
>
> I hope you can download it, unfortunately I cannot send copies as the
> publisher has its copyright.
>
> Lucas B. Dias
>
>
>


Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Dear community,

I want to tell you about my paper published in a conference in March. The
title is " NoSQL Database Performance Tuning for IoT Data - Cassandra Case
Study"  and it is available (not for free) in
http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006782702770284
 .

TWCS is used and compared with DTCS.

I hope you can download it, unfortunately I cannot send copies as the
publisher has its copyright.

Lucas B. Dias


RE: [EXTERNAL] Re: Error after 3.1.0 to 3.11.2 upgrade

2018-05-14 Thread Durity, Sean R
A couple additional things:


-  Make sure that you ran repair on the system_auth keyspace on all 
nodes after changing the RF

-  If you are not often changing roles/permissions, you might look to 
increase permissions_validity_in_ms and roles_validity_in_ms so they are not 
being fetched all the time (especially with the internal Cassandra 
Authorizer/Authenticator).


Sean Durity

From: Jeff Jirsa 
Sent: Saturday, May 12, 2018 9:21 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Error after 3.1.0 to 3.11.2 upgrade

RF of one means all auth requests go to the same node, so they’re more likely 
to time out if that host is overloaded or restarts

Increasing it distributed the queries among more hosts

--
Jeff Jirsa


On May 12, 2018, at 6:11 AM, Abdul Patel 
> wrote:
Yeah found that all had 3 replication factor and system_auth had 1 , chnaged to 
3 now ..so was this issue due to system_auth replication facyor mismatch?

On Saturday, May 12, 2018, Hannu Kröger 
> wrote:
Hi,

Did you check replication strategy and amounts of replicas of system_auth 
keyspace?

Hannu

Abdul Patel > kirjoitti 
12.5.2018 kello 5.21:
No applicatiom isnt impacted ..no complains ..
Also its an 4 node cluster in lower non production and all are on same version.

On Friday, May 11, 2018, Jeff Jirsa > 
wrote:
The read is timing out - is the cluster healthy? Is it fully upgraded or mixed 
versions? Repeated isn’t great, but is the application impacted?
--
Jeff Jirsa


On May 12, 2018, at 6:17 AM, Abdul Patel 
> wrote:
Seems its coming from 3.10, got bunch of them today for 3.11.2, so if this is 
repeatedly coming , whats solution for this?

WARN  [Native-Transport-Requests-24] 2018-05-11 16:46:20,938 
CassandraAuthorizer.java:96 - CassandraAuthorizer failed to authorize # for 
ERROR [Native-Transport-Requests-24] 2018-05-11 16:46:20,940 
ErrorMessage.java:384 - Unexpected exception during request
com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]
at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-18.0.jar:na]
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]
at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:439) 
~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:368)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:345)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:332) 
~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:310)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:260)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:221)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507)
 ~[apache-cassandra-3.11.2.jar:3.11.2]

On Fri, May 11, 2018 at 8:30 PM, Jeff Jirsa 
> wrote:
That looks like Cassandra 3.10 not 3.11.2

It’s also just the auth cache failing to refresh - if it’s transient it’s 
probably not a big deal. If it continues then there may be an issue with the 
cache refresher.
--
Jeff Jirsa


On May 12, 2018, at 5:55 AM, Abdul Patel 
> wrote:
HI All,

Seen below stack trace messages , in errorlog  one day after upgrade.
one of the blogs said this might be due to old drivers, but not sure on it.

FYI :

INFO