[ 
https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249174#comment-14249174
 ] 

Pavel Yaskevich edited comment on CASSANDRA-7275 at 12/16/14 11:50 PM:
-----------------------------------------------------------------------

The problem is that there is no way to tell if hard-link problem is a actual 
fs/disk problem or programming error, right now it looks like a programming 
error because it snapshot tries to create duplicate hard-link to the same file 
as I mentioned in the CASSANDRA-8476 so if there is no way to tell how 
reasonable is it to enforce shutdown or any rule from "disk_failure_policy"?

bq. If it's our bug, then you may need a temporary patch while we figure out 
the cause, but I still don't think that kind of // this shouldn't happen code 
should be shipped officially.

If it's your problem it's my problem as well, we have a work-around for now (as 
I guess most of the people do) but my intention in this ticket to fix this 
problem for good instead of just fixing the symptom of it (being aforementioned 
"duplicate hard-link" problem).


was (Author: xedin):
The problem is that there is no way to tell if hard-link problem is a actual 
fs/disk problem or programming error, right now it looks like a programming 
error because it snapshot tries to create duplicate hard-link to the same file 
as I mentioned in the CASSANDRA-8476 so if there is no way to tell how 
reasonable is it to enforce shutdown or any rule from "disk_failure_policy"?

bq. If it's our bug, then you may need a temporary patch while we figure out 
the cause, but I still don't think that kind of // this shouldn't happen code 
should be shipped officially.

If it's your problem it's my problem as well, we can work around for now (as I 
guess most of the people do) but my intention in this ticket to fix this 
problem for good instead of just fixing the symptom of it (being aforementioned 
"duplicate hard-link" problem).

> Errors in FlushRunnable may leave threads hung
> ----------------------------------------------
>
>                 Key: CASSANDRA-7275
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7275
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 2.0.12
>
>         Attachments: 0001-Move-latch.countDown-into-finally-block.patch, 
> 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch
>
>
> In Memtable.FlushRunnable, the CountDownLatch will never be counted down if 
> there are errors, which results in hanging any threads that are waiting for 
> the flush to complete.  For example, an error like this causes the problem:
> {noformat}
> ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line 
> 198) Exception in thread Thread[FlushWriter:474,5,main]
> java.lang.IllegalArgumentException
>     at java.nio.Buffer.position(Unknown Source)
>     at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64)
>     at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
>     at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138)
>     at 
> org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
>     at 
> org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439)
>     at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194)
>     at 
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397)
>     at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
>     at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to