[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249174#comment-14249174 ]
Pavel Yaskevich edited comment on CASSANDRA-7275 at 12/16/14 11:50 PM: ----------------------------------------------------------------------- The problem is that there is no way to tell if hard-link problem is a actual fs/disk problem or programming error, right now it looks like a programming error because it snapshot tries to create duplicate hard-link to the same file as I mentioned in the CASSANDRA-8476 so if there is no way to tell how reasonable is it to enforce shutdown or any rule from "disk_failure_policy"? bq. If it's our bug, then you may need a temporary patch while we figure out the cause, but I still don't think that kind of // this shouldn't happen code should be shipped officially. If it's your problem it's my problem as well, we have a work-around for now (as I guess most of the people do) but my intention in this ticket to fix this problem for good instead of just fixing the symptom of it (being aforementioned "duplicate hard-link" problem). was (Author: xedin): The problem is that there is no way to tell if hard-link problem is a actual fs/disk problem or programming error, right now it looks like a programming error because it snapshot tries to create duplicate hard-link to the same file as I mentioned in the CASSANDRA-8476 so if there is no way to tell how reasonable is it to enforce shutdown or any rule from "disk_failure_policy"? bq. If it's our bug, then you may need a temporary patch while we figure out the cause, but I still don't think that kind of // this shouldn't happen code should be shipped officially. If it's your problem it's my problem as well, we can work around for now (as I guess most of the people do) but my intention in this ticket to fix this problem for good instead of just fixing the symptom of it (being aforementioned "duplicate hard-link" problem). > Errors in FlushRunnable may leave threads hung > ---------------------------------------------- > > Key: CASSANDRA-7275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tyler Hobbs > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 2.0.12 > > Attachments: 0001-Move-latch.countDown-into-finally-block.patch, > 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch > > > In Memtable.FlushRunnable, the CountDownLatch will never be counted down if > there are errors, which results in hanging any threads that are waiting for > the flush to complete. For example, an error like this causes the problem: > {noformat} > ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line > 198) Exception in thread Thread[FlushWriter:474,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.position(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) > at > org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) > at > org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) > at > org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)