[
https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249733#comment-14249733
]
Benedict commented on CASSANDRA-7275:
-------------------------------------
Just to add to what Sylvain says about the size of the memtable, to hopefully
help target a solution (spoken agnostically): in 2.1 we could become almost
immediately unusable for writes if the memtable(s) we are retaining after this
(or multiple exceptions) exceed a certain proportion of memory, as we will stop
even trying to flush. So for 2.1 at least if we're going to try and stay alive
we need to consider if we would prefer to drop writes on the floor
(agressively, to avoid build up in the queue) if the set of memtables in limbo
is too large, or if we drop memtables until we reclaim enough space to proceed,
or if we introduce some special logic for flushing in this event.
In 2.0, conversely, we may flush millions of tiny sstables in the wrong
scenario, but this would not prevent function unless it permitted excess heap
growth, or a compaction death spiral.
> Errors in FlushRunnable may leave threads hung
> ----------------------------------------------
>
> Key: CASSANDRA-7275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7275
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Tyler Hobbs
> Assignee: Pavel Yaskevich
> Priority: Minor
> Fix For: 2.0.12
>
> Attachments: 0001-Move-latch.countDown-into-finally-block.patch,
> 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch
>
>
> In Memtable.FlushRunnable, the CountDownLatch will never be counted down if
> there are errors, which results in hanging any threads that are waiting for
> the flush to complete. For example, an error like this causes the problem:
> {noformat}
> ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line
> 198) Exception in thread Thread[FlushWriter:474,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.position(Unknown Source)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138)
> at
> org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
> at
> org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)