[ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072180#comment-14072180 ]
Benedict edited comment on CASSANDRA-7546 at 7/23/14 7:19 PM: -------------------------------------------------------------- Well, actually the scheme I outlined isn't _exactly_ requiring a rate of 100MB/s; all that actually needs to happen is it consistently exceed a rate of 1Mb/s for a total allocation of 100MB (which can happen if > 100MB are allocated in < 1 second, i.e. 100MB/s, but also if 110Mb is allocated over 10s). We can tweak those numbers however we like (within some window of representable numbers with enough range). For instance exceed a rate of 10MB/s consistently by a total of 10MB, which would require e.g. dividing our bytes allocated by 1k, measuring time in 100ns intervals, and offset the present by 10 * 1024. To capture a rate of 100MB/s, we would need to either expect that memtables never live for more than 0.5 days (probably reasonable, i.e. represent time in 10ns intervals) or require that a single mutator allocates 10k in one run (also quite reasonable) but we're pushing the limits of what we can safely represent. bq. nanoTime is not monotonic It is monotonic; that's its main purpose. Although there are no doubt caveats on a given machine/processor for how strictly that is guaranteed bq. which clones are you talking about Mistype. I mean the number/size of objects we estimate we've allocated wastefully for the collection (snap tree / btree). We can estimate this in 2.0 with 200+100*lg2(N), and in 2.1 we measure it exactly. was (Author: benedict): Well, actually the scheme I outlined isn't _exactly_ requiring a rate of 100MB/s; all that actually needs to happen is it consistently exceed a rate of 1Mb/s for a total allocation of 100MB (which can happen if > 100MB are allocated in < 1 second, i.e. 100MB/s, but also if 110Mb is allocated over 10s). We can tweak those numbers however we like (within some window of representable numbers with enough range). For instance exceed a rate of 10MB/s consistently by a total of 10MB, which would require e.g. dividing our bytes allocated by 1k, measuring time in 100ns intervals, and offset the present by 10 * 1024. To capture a rate of 100MB/s, we would need to either expect that memtables never live for more than 0.5 days (probably reasonable, i.e. represent time in 10ns intervals) or require that a single mutator allocates 10k in one run (also quite reasonable) but we're pushing the limits of what we can safely represent. bq. nanoTime is not monotonic It is monotonic; that's its main purpose. Although there are no doubt caveats on a given machine/processor for how strictly that is guaranteed bq. which clones are you talking about Mistype. I mean the number/size of objects we estimate we've allocated wastefully. We can estimate this in 2.0 with 200+100*lg2(N), and in 2.1 we measure it exactly. > AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory > ----------------------------------------------------------------------------- > > Key: CASSANDRA-7546 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7546 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: graham sanderson > Assignee: graham sanderson > Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, > 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt > > > In order to preserve atomicity, this code attempts to read, clone/update, > then CAS the state of the partition. > Under heavy contention for updating a single partition this can cause some > fairly staggering memory growth (the more cores on your machine the worst it > gets). > Whilst many usage patterns don't do highly concurrent updates to the same > partition, hinting today, does, and in this case wild (order(s) of magnitude > more than expected) memory allocation rates can be seen (especially when the > updates being hinted are small updates to different partitions which can > happen very fast on their own) - see CASSANDRA-7545 > It would be best to eliminate/reduce/limit the spinning memory allocation > whilst not slowing down the very common un-contended case. -- This message was sent by Atlassian JIRA (v6.2#6252)