[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061514#comment-14061514
 ] 

graham sanderson edited comment on CASSANDRA-7546 at 7/15/14 2:28 AM:
----------------------------------------------------------------------

The stateful learning behavior here seems like a good thing... always 
attempting one iteration of the first loop only to fail, before doing any form 
of synchronization would mean a ratio of at least 2 in the highly contended 
case.

>From looking at the numbers above, you'll see that this code:

- maintains a ratio of 1 as expected in the uncontended cases
- maintains a ratio of 1 in the highly contended cases, vs on this box as high 
as 17 in the original code (which causes massive memory allocation). e.g.
{code}
    [junit] Threads = 100 elements = 100000 (of size 64) partitions = 1
    [junit]  original code:
    [junit]   Duration = 1730ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 99 ms for 30 collections
    [junit]   Approx allocation = 9842MB vs 8MB; ratio to raw data size = 
1228.6645866666668
    [junit]   loopRatio (closest to 1 best) 17.41481 raw 100000/1741481 counted 
0/0 sync 0/0 up 0 down 0
    [junit] 
    [junit]  modified code: 
    [junit]   Duration = 1300ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 16 ms for 1 collections
    [junit]   Approx allocation = 561MB vs 8MB; ratio to raw data size = 
70.0673819047619
    [junit]   loopRatio (closest to 1 best) 1.00004 raw 258/260 counted 2/2 
sync 99741/99742 up 1 down 1
{code}
- seems to max out at about 1.3 for cases in between, with that generally lower 
or very very close to the original code. e.g.
{code}
    [junit] Threads = 100 elements = 100000 (of size 256) partitions = 16
    [junit]  original code:
    [junit]   Duration = 220ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 24 ms for 2 collections
    [junit]   Approx allocation = 770MB vs 26MB; ratio to raw data size = 
29.258727826086957
    [junit]   loopRatio (closest to 1 best) 1.87623 raw 100000/187623 counted 
0/0 sync 0/0 up 0 down 0
    [junit] 
    [junit]  modified code: 
    [junit]   Duration = 216ms maxConcurrency = 98
    [junit]   GC for PS Scavenge: 28 ms for 2 collections
    [junit]   Approx allocation = 581MB vs 26MB; ratio to raw data size = 
22.077911884057972
    [junit]   loopRatio (closest to 1 best) 1.33551 raw 52282/69043 counted 
18308/19001 sync 38617/45507 up 10826 down 10513
{code}


was (Author: graham sanderson):
The stateful learning behavior here seems like a good thing... attempting the 
first loop only to fail, before doing any form of synchronization would mean a 
ratio of at least 2 in the highly contended case.

>From looking at the numbers above, you'll see that this code:

- maintains a ratio of 1 as expected in the uncontended cases
- maintains a ratio of 1 in the highly contended cases, vs on this box as high 
as 17 in the original code (which causes massive memory allocation). e.g.
{code}
    [junit] Threads = 100 elements = 100000 (of size 64) partitions = 1
    [junit]  original code:
    [junit]   Duration = 1730ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 99 ms for 30 collections
    [junit]   Approx allocation = 9842MB vs 8MB; ratio to raw data size = 
1228.6645866666668
    [junit]   loopRatio (closest to 1 best) 17.41481 raw 100000/1741481 counted 
0/0 sync 0/0 up 0 down 0
    [junit] 
    [junit]  modified code: 
    [junit]   Duration = 1300ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 16 ms for 1 collections
    [junit]   Approx allocation = 561MB vs 8MB; ratio to raw data size = 
70.0673819047619
    [junit]   loopRatio (closest to 1 best) 1.00004 raw 258/260 counted 2/2 
sync 99741/99742 up 1 down 1
{code}
- seems to max out at about 1.3 for cases in between, with that generally lower 
or very very close to the original code. e.g.
{code}
    [junit] Threads = 100 elements = 100000 (of size 256) partitions = 16
    [junit]  original code:
    [junit]   Duration = 220ms maxConcurrency = 100
    [junit]   GC for PS Scavenge: 24 ms for 2 collections
    [junit]   Approx allocation = 770MB vs 26MB; ratio to raw data size = 
29.258727826086957
    [junit]   loopRatio (closest to 1 best) 1.87623 raw 100000/187623 counted 
0/0 sync 0/0 up 0 down 0
    [junit] 
    [junit]  modified code: 
    [junit]   Duration = 216ms maxConcurrency = 98
    [junit]   GC for PS Scavenge: 28 ms for 2 collections
    [junit]   Approx allocation = 581MB vs 26MB; ratio to raw data size = 
22.077911884057972
    [junit]   loopRatio (closest to 1 best) 1.33551 raw 52282/69043 counted 
18308/19001 sync 38617/45507 up 10826 down 10513
{code}

> AtomicSortedColumns.addAllWithSizeDelta has a spin lock that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: graham sanderson
>         Attachments: suggestion1.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to