[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072180#comment-14072180
 ] 

Benedict edited comment on CASSANDRA-7546 at 7/23/14 7:19 PM:
--------------------------------------------------------------

Well, actually the scheme I outlined isn't _exactly_ requiring a rate of 
100MB/s; all that actually needs to happen is it consistently exceed a rate of 
1Mb/s for a total allocation of 100MB (which can happen if > 100MB are 
allocated in < 1 second, i.e. 100MB/s, but also if 110Mb is allocated over 
10s). We can tweak those numbers however we like (within some window of 
representable numbers with enough range). For instance exceed a rate of 10MB/s 
consistently by a total of 10MB, which would require e.g. dividing our bytes 
allocated by 1k, measuring time in 100ns intervals, and offset the present by 
10 * 1024. To capture a rate of 100MB/s, we would need to either expect that 
memtables never live for more than 0.5 days (probably reasonable, i.e. 
represent time in 10ns intervals) or require that a single mutator allocates 
10k in one run (also quite reasonable) but we're pushing the limits of what we 
can safely represent.

bq. nanoTime is not monotonic

It is monotonic; that's its main purpose. Although there are no doubt caveats 
on a given machine/processor for how strictly that is guaranteed

bq. which clones are you talking about

Mistype. I mean the number/size of objects we estimate we've allocated 
wastefully for the collection (snap tree / btree). We can estimate this in 2.0 
with 200+100*lg2(N), and in 2.1 we measure it exactly.



was (Author: benedict):
Well, actually the scheme I outlined isn't _exactly_ requiring a rate of 
100MB/s; all that actually needs to happen is it consistently exceed a rate of 
1Mb/s for a total allocation of 100MB (which can happen if > 100MB are 
allocated in < 1 second, i.e. 100MB/s, but also if 110Mb is allocated over 
10s). We can tweak those numbers however we like (within some window of 
representable numbers with enough range). For instance exceed a rate of 10MB/s 
consistently by a total of 10MB, which would require e.g. dividing our bytes 
allocated by 1k, measuring time in 100ns intervals, and offset the present by 
10 * 1024. To capture a rate of 100MB/s, we would need to either expect that 
memtables never live for more than 0.5 days (probably reasonable, i.e. 
represent time in 10ns intervals) or require that a single mutator allocates 
10k in one run (also quite reasonable) but we're pushing the limits of what we 
can safely represent.

bq. nanoTime is not monotonic

It is monotonic; that's its main purpose. Although there are no doubt caveats 
on a given machine/processor for how strictly that is guaranteed

bq. which clones are you talking about

Mistype. I mean the number/size of objects we estimate we've allocated 
wastefully. We can estimate this in 2.0 with 200+100*lg2(N), and in 2.1 we 
measure it exactly.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to