Re: Cassandra Collections performance issue

2016-02-24 Thread Agrawal, Pratik
Hi Daemeon,

We tried changing the behavior "we overwrite every value" to update only 1 
element in the map, and still we saw the same performance degradation.

Thanks,
Pratik

From: daemeon reiydelle <daeme...@gmail.com<mailto:daeme...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, February 9, 2016 at 11:39 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Cc: "Peddi, Praveen" <pe...@amazon.com<mailto:pe...@amazon.com>>
Subject: Re: Cassandra Collections performance issue

I think the key to your problem might be around "we overwrite every value". You 
are creating a large number of tombstones, forcing many reads to pull current 
results. You would do well to rethink why you are having to to overwrite values 
all the time under the same key. You would be better to figure out haw to add 
values under a key then age off the old values. I would say that (at least at 
scale) you have a classic anti-pattern in play.


...

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli 
<rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote:
On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik 
<paagr...@amazon.com<mailto:paagr...@amazon.com>> wrote:
Recently we added one of the table fields from as Map<text, text> in Cassandra 
2.1.11. Currently we read every field from Map and overwrite map values. Map is 
of size 3. We saw that writes are 30-40% slower while reads are 70-80% slower. 
Please find below some metrics that can help.

My question is, Are there any known issues in Cassandra map performance?  As I 
understand it each of the CQL3 Map entry, maps to a column in cassandra, with 
that assumption we are just creating 3 columns right? Any insight on this issue 
would be helpful.

I have previously heard reports along similar lines, but in the other direction.

eg - "I moved from a collection to a TEXT column with JSON in it, and my reads 
and writes both became much faster!"

I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow if 
it is a known and expected limitation as opposed to just a performance issue.

If I were you, I would consider filing a repro case as a Jira ticket, and 
responding to this thread with its URL. :D

=Rob




Re: Cassandra Collections performance issue

2016-02-11 Thread Clint Martin
I have experienced excessive performance issues while using collections as
well. Mostly my issue was due to the excessive number of cells per
partition that having a modest map size requires.

Since you are reading and writing the entire map, you can probably gain
some performance the same way I did. Convert you map to be a frozen map.
This essentially puts you in the same place as folks who migrate to a blob
of json, but it puts the onus on Cassandra to manage serializing and
deserializing the map.   It does have limitations over a regular map.. You
cant append values, you can't selectively ttl, reading single keys requires
deserializing the whole collection. Basically anything besides reading and
writing the whole collection becomes a little harder. But it is
considerably faster due to the lower cell count and management overhead.

Clint
On Feb 8, 2016 5:11 PM, "Agrawal, Pratik"  wrote:

> Hello all,
>
> Recently we added one of the table fields from as Map in 
> *Cassandra
> 2.1.11*. Currently we read every field from Map and overwrite map values.
> Map is of size 3. We saw that writes are 30-40% slower while reads are
> 70-80% slower. Please find below some metrics that can help.
>
> My question is, Are there any known issues in Cassandra map performance?
> As I understand it each of the CQL3 Map entry, maps to a column in
> cassandra, with that assumption we are just creating 3 columns right? Any
> insight on this issue would be helpful.
>
> Datastax Java Driver 2.1.6.
> Machine: Amazon C3 2x large
> CPU – pretty much same as before (around 30%)
> Memory – max around 4.8 GB
>
> CFSTATS:
>
> Keyspace: Keyspace
> Read Count: 28359044
> Read Latency: 2.847392469259542 ms.
> Write Count: 1152765
> Write Latency: 0.14778018590085576 ms.
> Pending Flushes: 0
> Table: table1
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 4119699
> Space used (total): 4119699
> Space used by snapshots (total): 90323640
> Off heap memory used (total): 2278
> SSTable Compression Ratio: 0.23172161124142604
> Number of keys (estimate): 14
> Memtable cell count: 6437
> Memtable data size: 872912
> Memtable off heap memory used: 0
> Memtable switch count: 7626
> Local read count: 27754634
> Local read latency: 1.921 ms
> Local write count: 1113668
> Local write latency: 0.142 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 88
> Index summary off heap memory used: 46
> Compression metadata off heap memory used: 2144
> Compacted partition minimum bytes: 315853
> Compacted partition maximum bytes: 4055269
> Compacted partition mean bytes: 2444011
> Average live cells per slice (last five minutes): 17.536775249005437
> Maximum live cells per slice (last five minutes): 1225.0
> Average tombstones per slice (last five minutes): 34.99979575985972
> Maximum tombstones per slice (last five minutes): 3430.0
>
> Table: table2
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 869900
> Space used (total): 869900
> Space used by snapshots (total): 17279824
> Off heap memory used (total): 387
> SSTable Compression Ratio: 0.3999013540551859
> Number of keys (estimate): 2
> Memtable cell count: 1958
> Memtable data size: 8
> Memtable off heap memory used: 0
> Memtable switch count: 7484
> Local read count: 604412
> Local read latency: 45.421 ms
> Local write count: 39097
> Local write latency: 0.337 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 96
> Bloom filter off heap memory used: 88
> Index summary off heap memory used: 35
> Compression metadata off heap memory used: 264
> Compacted partition minimum bytes: 1955667
> Compacted partition maximum bytes: 2346799
> Compacted partition mean bytes: 2346799
> Average live cells per slice (last five minutes): 1963.0632242863855
> Maximum live cells per slice (last five minutes): 5001.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> *NETSTATS:*
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 2853996
> Mismatch (Blocking): 67386
> Mismatch (Background): 9233
> Pool NameActive   Pending  Completed
> Commandsn/a 0   33953165
> Responses   n/a 0 370301
>
> *IOSTAT*
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   15.200.830.560.100.04   83.27
>
> Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> xvda  2.79 0.4769.86 553719   82619304
> xvdb 14.49 3.39   775.564009600  917227536
> xvdc 15.13 2.98   819.933522250  969708944
> dm-0 49.67 6.36   

Re: Cassandra Collections performance issue

2016-02-11 Thread Jack Krupansky
Just to help other users reading along here, what is your access pattern
with maps? I mean, do you typically have a large or small number of keys
set, are you typically mostly adding keys or deleting keys a lot, adding
one at a time or adding and deleting a lot in a single request, or... what?
And are you indexing map columns, keys or values?

-- Jack Krupansky

On Thu, Feb 11, 2016 at 10:44 AM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> I have experienced excessive performance issues while using collections as
> well. Mostly my issue was due to the excessive number of cells per
> partition that having a modest map size requires.
>
> Since you are reading and writing the entire map, you can probably gain
> some performance the same way I did. Convert you map to be a frozen map.
> This essentially puts you in the same place as folks who migrate to a blob
> of json, but it puts the onus on Cassandra to manage serializing and
> deserializing the map.   It does have limitations over a regular map.. You
> cant append values, you can't selectively ttl, reading single keys requires
> deserializing the whole collection. Basically anything besides reading and
> writing the whole collection becomes a little harder. But it is
> considerably faster due to the lower cell count and management overhead.
>
> Clint
> On Feb 8, 2016 5:11 PM, "Agrawal, Pratik"  wrote:
>
>> Hello all,
>>
>> Recently we added one of the table fields from as Map in 
>> *Cassandra
>> 2.1.11*. Currently we read every field from Map and overwrite map
>> values. Map is of size 3. We saw that writes are 30-40% slower while reads
>> are 70-80% slower. Please find below some metrics that can help.
>>
>> My question is, Are there any known issues in Cassandra map performance?
>> As I understand it each of the CQL3 Map entry, maps to a column in
>> cassandra, with that assumption we are just creating 3 columns right? Any
>> insight on this issue would be helpful.
>>
>> Datastax Java Driver 2.1.6.
>> Machine: Amazon C3 2x large
>> CPU – pretty much same as before (around 30%)
>> Memory – max around 4.8 GB
>>
>> CFSTATS:
>>
>> Keyspace: Keyspace
>> Read Count: 28359044
>> Read Latency: 2.847392469259542 ms.
>> Write Count: 1152765
>> Write Latency: 0.14778018590085576 ms.
>> Pending Flushes: 0
>> Table: table1
>> SSTable count: 1
>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>> Space used (live): 4119699
>> Space used (total): 4119699
>> Space used by snapshots (total): 90323640
>> Off heap memory used (total): 2278
>> SSTable Compression Ratio: 0.23172161124142604
>> Number of keys (estimate): 14
>> Memtable cell count: 6437
>> Memtable data size: 872912
>> Memtable off heap memory used: 0
>> Memtable switch count: 7626
>> Local read count: 27754634
>> Local read latency: 1.921 ms
>> Local write count: 1113668
>> Local write latency: 0.142 ms
>> Pending flushes: 0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 96
>> Bloom filter off heap memory used: 88
>> Index summary off heap memory used: 46
>> Compression metadata off heap memory used: 2144
>> Compacted partition minimum bytes: 315853
>> Compacted partition maximum bytes: 4055269
>> Compacted partition mean bytes: 2444011
>> Average live cells per slice (last five minutes): 17.536775249005437
>> Maximum live cells per slice (last five minutes): 1225.0
>> Average tombstones per slice (last five minutes): 34.99979575985972
>> Maximum tombstones per slice (last five minutes): 3430.0
>>
>> Table: table2
>> SSTable count: 1
>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>> Space used (live): 869900
>> Space used (total): 869900
>> Space used by snapshots (total): 17279824
>> Off heap memory used (total): 387
>> SSTable Compression Ratio: 0.3999013540551859
>> Number of keys (estimate): 2
>> Memtable cell count: 1958
>> Memtable data size: 8
>> Memtable off heap memory used: 0
>> Memtable switch count: 7484
>> Local read count: 604412
>> Local read latency: 45.421 ms
>> Local write count: 39097
>> Local write latency: 0.337 ms
>> Pending flushes: 0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 96
>> Bloom filter off heap memory used: 88
>> Index summary off heap memory used: 35
>> Compression metadata off heap memory used: 264
>> Compacted partition minimum bytes: 1955667
>> Compacted partition maximum bytes: 2346799
>> Compacted partition mean bytes: 2346799
>> Average live cells per slice (last five minutes): 1963.0632242863855
>> Maximum live cells per slice (last five minutes): 5001.0
>> Average tombstones per slice (last five minutes): 0.0
>> Maximum tombstones per slice (last five minutes): 0.0
>>
>> *NETSTATS:*
>> Mode: NORMAL
>> Not sending any streams.
>> Read Repair Statistics:
>> Attempted: 2853996
>> Mismatch (Blocking): 67386
>> Mismatch (Background): 9233
>> Pool NameActive   Pending  

Re: Cassandra Collections performance issue

2016-02-10 Thread Benedict Elliott Smith
If the overwrites are per map key there are no tombstones generated; only
if the whole map is re-imaged are tombstones created, and prior to 3.0 this
indeed can be major problem if done frequently.

Prior to 3.0 collections also forbid certain optimisations to cell
comparisons, and as a result can yield appreciable performance decline when
they're added to a table. Unfortunately dropping the collection won't
resolve the performance degradation, as its prior presence continues to
haunt the table. To restore performance you will need to recreate your
table without the collection column and reinsert your data. Or upgrade to
3.0.


On 9 February 2016 at 16:39, daemeon reiydelle  wrote:

> I think the key to your problem might be around "we overwrite every
> value". You are creating a large number of tombstones, forcing many reads
> to pull current results. You would do well to rethink why you are having to
> to overwrite values all the time under the same key. You would be better to
> figure out haw to add values under a key then age off the old values. I
> would say that (at least at scale) you have a classic anti-pattern in play.
>
>
> *...*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli  wrote:
>
>> On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik 
>> wrote:
>>
>>> Recently we added one of the table fields from as Map in 
>>> *Cassandra
>>> 2.1.11*. Currently we read every field from Map and overwrite map
>>> values. Map is of size 3. We saw that writes are 30-40% slower while reads
>>> are 70-80% slower. Please find below some metrics that can help.
>>>
>>> My question is, Are there any known issues in Cassandra map
>>> performance?  As I understand it each of the CQL3 Map entry, maps to a
>>> column in cassandra, with that assumption we are just creating 3 columns
>>> right? Any insight on this issue would be helpful.
>>>
>>
>> I have previously heard reports along similar lines, but in the other
>> direction.
>>
>> eg - "I moved from a collection to a TEXT column with JSON in it, and my
>> reads and writes both became much faster!"
>>
>> I'm not sure if the issue has been raised as an Apache Cassandra Jira,
>> iow if it is a known and expected limitation as opposed to just a
>> performance issue.
>>
>> If I were you, I would consider filing a repro case as a Jira ticket, and
>> responding to this thread with its URL. :D
>>
>> =Rob
>>
>>
>
>


Re: Cassandra Collections performance issue

2016-02-09 Thread daemeon reiydelle
I think the key to your problem might be around "we overwrite every value".
You are creating a large number of tombstones, forcing many reads to pull
current results. You would do well to rethink why you are having to to
overwrite values all the time under the same key. You would be better to
figure out haw to add values under a key then age off the old values. I
would say that (at least at scale) you have a classic anti-pattern in play.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli  wrote:

> On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik 
> wrote:
>
>> Recently we added one of the table fields from as Map in 
>> *Cassandra
>> 2.1.11*. Currently we read every field from Map and overwrite map
>> values. Map is of size 3. We saw that writes are 30-40% slower while reads
>> are 70-80% slower. Please find below some metrics that can help.
>>
>> My question is, Are there any known issues in Cassandra map performance?
>> As I understand it each of the CQL3 Map entry, maps to a column in
>> cassandra, with that assumption we are just creating 3 columns right? Any
>> insight on this issue would be helpful.
>>
>
> I have previously heard reports along similar lines, but in the other
> direction.
>
> eg - "I moved from a collection to a TEXT column with JSON in it, and my
> reads and writes both became much faster!"
>
> I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow
> if it is a known and expected limitation as opposed to just a performance
> issue.
>
> If I were you, I would consider filing a repro case as a Jira ticket, and
> responding to this thread with its URL. :D
>
> =Rob
>
>


Re: Cassandra Collections performance issue

2016-02-08 Thread Agrawal, Pratik
Hello all,

Recently we added one of the table fields from as Map in Cassandra 
2.1.11. Currently we read every field from Map and overwrite map values. Map is 
of size 3. We saw that writes are 30-40% slower while reads are 70-80% slower. 
Please find below some metrics that can help.

My question is, Are there any known issues in Cassandra map performance?  As I 
understand it each of the CQL3 Map entry, maps to a column in cassandra, with 
that assumption we are just creating 3 columns right? Any insight on this issue 
would be helpful.

Datastax Java Driver 2.1.6.
Machine: Amazon C3 2x large
CPU – pretty much same as before (around 30%)
Memory – max around 4.8 GB

CFSTATS:

Keyspace: Keyspace
Read Count: 28359044
Read Latency: 2.847392469259542 ms.
Write Count: 1152765
Write Latency: 0.14778018590085576 ms.
Pending Flushes: 0
Table: table1
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 4119699
Space used (total): 4119699
Space used by snapshots (total): 90323640
Off heap memory used (total): 2278
SSTable Compression Ratio: 0.23172161124142604
Number of keys (estimate): 14
Memtable cell count: 6437
Memtable data size: 872912
Memtable off heap memory used: 0
Memtable switch count: 7626
Local read count: 27754634
Local read latency: 1.921 ms
Local write count: 1113668
Local write latency: 0.142 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 96
Bloom filter off heap memory used: 88
Index summary off heap memory used: 46
Compression metadata off heap memory used: 2144
Compacted partition minimum bytes: 315853
Compacted partition maximum bytes: 4055269
Compacted partition mean bytes: 2444011
Average live cells per slice (last five minutes): 17.536775249005437
Maximum live cells per slice (last five minutes): 1225.0
Average tombstones per slice (last five minutes): 34.99979575985972
Maximum tombstones per slice (last five minutes): 3430.0

Table: table2
SSTable count: 1
SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 869900
Space used (total): 869900
Space used by snapshots (total): 17279824
Off heap memory used (total): 387
SSTable Compression Ratio: 0.3999013540551859
Number of keys (estimate): 2
Memtable cell count: 1958
Memtable data size: 8
Memtable off heap memory used: 0
Memtable switch count: 7484
Local read count: 604412
Local read latency: 45.421 ms
Local write count: 39097
Local write latency: 0.337 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 96
Bloom filter off heap memory used: 88
Index summary off heap memory used: 35
Compression metadata off heap memory used: 264
Compacted partition minimum bytes: 1955667
Compacted partition maximum bytes: 2346799
Compacted partition mean bytes: 2346799
Average live cells per slice (last five minutes): 1963.0632242863855
Maximum live cells per slice (last five minutes): 5001.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

NETSTATS:
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 2853996
Mismatch (Blocking): 67386
Mismatch (Background): 9233
Pool NameActive   Pending  Completed
Commandsn/a 0   33953165
Responses   n/a 0 370301

IOSTAT
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  15.200.830.560.100.04   83.27

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvda  2.79 0.4769.86 553719   82619304
xvdb 14.49 3.39   775.564009600  917227536
xvdc 15.13 2.98   819.933522250  969708944
dm-0 49.67 6.36  1595.497525858 1886936320

TPSTAT:
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 01199683 0  
   0
ReadStage 0 0   28449207 0  
   0
RequestResponseStage  0 0   33983356 0  
   0
ReadRepairStage   0 02865749 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0  2 0  
   0
GossipStage   0 0 270364 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0

Re: Cassandra Collections performance issue

2016-02-08 Thread Robert Coli
On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik  wrote:

> Recently we added one of the table fields from as Map in 
> *Cassandra
> 2.1.11*. Currently we read every field from Map and overwrite map values.
> Map is of size 3. We saw that writes are 30-40% slower while reads are
> 70-80% slower. Please find below some metrics that can help.
>
> My question is, Are there any known issues in Cassandra map performance?
> As I understand it each of the CQL3 Map entry, maps to a column in
> cassandra, with that assumption we are just creating 3 columns right? Any
> insight on this issue would be helpful.
>

I have previously heard reports along similar lines, but in the other
direction.

eg - "I moved from a collection to a TEXT column with JSON in it, and my
reads and writes both became much faster!"

I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow
if it is a known and expected limitation as opposed to just a performance
issue.

If I were you, I would consider filing a repro case as a Jira ticket, and
responding to this thread with its URL. :D

=Rob