[
https://issues.apache.org/jira/browse/CASSANDRA-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078171#comment-14078171
]
Vishal Mehta edited comment on CASSANDRA-6666 at 7/29/14 7:01 PM:
------------------------------------------------------------------
Hello Everyone,
Please pardon my ignorance, since I am writing first time in opensource bug
report.
Recently I think I hit this bug because I saw similar symptoms in my 3 node
cassandra setup. Where I am running a test with around 12K qps (inserts in 3
different tables) with TTL set to 1 hour and keyspace has GC seconds set to
14400 (4 hours).
So tests eventually runs to a point where Cassandra sees Tombstones more than
100K and it crashes with following exception in
/var/log/cassandra/cassandra.log.
{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query
aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
at
org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}
*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one
of the node deleted all the records from disk and freed up the space, where as
other two nodes never deleted their tombstones.
Please advise.
Regards,
Vishal
was (Author: vmehta):
Hello Every,
Please pardon my ignorance, since I am writing first time in opensource bug
report.
Recently I think I hit this bug because I saw similar symptoms in my 3 node
cassandra setup. Where I am running a test with around 12K qps (inserts in 3
different tables) with TTL set to 1 hour and keyspace has GC seconds set to
14400 (4 hours).
So tests eventually runs to a point where Cassandra sees Tombstones more than
100K and it crashes with following exception in
/var/log/cassandra/cassandra.log.
{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query
aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
at
org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}
*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one
of the node deleted all the records from disk and freed up the space, where as
other two nodes never deleted their tombstones.
> Avoid accumulating tombstones after partial hint replay
> -------------------------------------------------------
>
> Key: CASSANDRA-6666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6666
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Priority: Minor
> Labels: hintedhandoff
> Fix For: 2.0.10
>
> Attachments: 6666.txt, cassandra_system.log.debug.gz
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)