[
https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247538#comment-14247538
]
Pavel Yaskevich commented on CASSANDRA-7275:
--------------------------------------------
There is an option to die on the I/O error and I'm happy to make it so we die
if we got FSWriteError or similar if requested by config.
bq. Generally we take the approach of dying if a non-recoverable error occurs,
and while I agree the risk of killing a whole cluster through a bug is
suboptimal, we already run that risk in a number of places in the codebase
(current behaviour here included, just with less alacrity). In my opinion this
is preferable to potentially re-introducing dead data, or having the complexity
of safely keeping the process alive as a zombie, and ensuring that zombie
doesn't degrade cluster performance by hobbling instead of dying.
Here is your real world scenario, which we are hitting from time to time, right
now if I/O error occurs in the replaceFlushed (e.g. trying to create hard-link
for system.compactions_in_progress) all of the compaction threads are going to
get blocked and performance is going to gradually degrade until it gets to the
point when alerts from compaction pending trigger, at that time somebody has to
(most luckily wake up) figure out what is going on and restart the node, once
it starts back up the amount of catching up it has to do in terms of the
compaction is substantial. This problem happens on the number of machines at
the same time so if we were to kill the nodes right when aforementioned error
occurs (although it's not affecting actual flush or compaction) that would mean
that part of the ring just went dark and one just has to pray that those nodes
weren't neighbors, so in this case serve some stale reads (which is not even
the case if failure in in bookeeping CF) with error in the log is much better
than loose portion of the cluster for (possibly tens) minutes without any idea
of what is going on.
In this situation I would rather ignore problems with book-keeping CFs or save
CL segments forget about it and/or bumping up read-repair chance at the same
time.
Everybody who is running Cassandra or any other database/system wants a peace
of mind that's why regular repairs and all sorts of the alerting/monitoring
systems are in-place, if there is something in the log which indicates a
problem it gives people time to think about their next steps instead of
chaotically trying to fix what ever mess we left on failure.
bq. Other than dying, periodically trying to re-flush and only keeling over
when we run out of room or have failed for a long period (possibly random? to
avoid the tiny risk of bunching) seems like a good idea.
This is not going to help if the problem data driven or external, you just
going to trash flusher threads without doing any useful work.
> Errors in FlushRunnable may leave threads hung
> ----------------------------------------------
>
> Key: CASSANDRA-7275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7275
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Tyler Hobbs
> Assignee: Pavel Yaskevich
> Priority: Minor
> Fix For: 2.0.12
>
> Attachments: 0001-Move-latch.countDown-into-finally-block.patch,
> 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch
>
>
> In Memtable.FlushRunnable, the CountDownLatch will never be counted down if
> there are errors, which results in hanging any threads that are waiting for
> the flush to complete. For example, an error like this causes the problem:
> {noformat}
> ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line
> 198) Exception in thread Thread[FlushWriter:474,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.position(Unknown Source)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138)
> at
> org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
> at
> org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)