[ 
https://issues.apache.org/jira/browse/CASSANDRA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505529#comment-14505529
 ] 

Ariel Weisberg commented on CASSANDRA-8798:
-------------------------------------------

[~jjirsa] Too scary without a test that shows that it transitions back to 
respecting tombstone thresholds. 

Can you make a dtest that reproduces the problem without your fix? You can set 
a low threshold to make it easy.

Looking at the predicate for respectTombstoneThresholds() I am a little 
concerned that it might be true after bootstrapping since it is all ||. It's 
also a lot of predicates when maybe one should suffice since bootstrapping 
nodes aren't supposed to serve reads at all anyways and that is the only thing 
effected by the tombstone thresholds. Each extra predicate feels like an 
opportunity to make a mistake transitioning back to honoring the threshold at 
some point in the future.

> don't throw TombstoneOverwhelmingException during bootstrap
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-8798
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8798
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: mck
>            Assignee: Jeff Jirsa
>             Fix For: 2.0.15
>
>         Attachments: 8798.txt
>
>
> During bootstrap honouring tombstone_failure_threshold seems 
> counter-productive as the node is not serving requests so not protecting 
> anything.
> Instead what happens is bootstrap fails, and a cluster that obviously needs 
> an extra node isn't getting it...
> **History**
> When adding a new node bootstrap process looks complete in that streaming is 
> finished, compactions finished, and all disk and cpu activity is calm.
> But the node is still stuck in "joining" status. 
> The last stage in the bootstrapping process is the rebuilding of secondary 
> indexes. grepping the logs confirmed it failed during this stage.
> {code}grep SecondaryIndexManager cassandra/logs/*{code}
> To see what secondary index rebuilding was initiated
> {code}
> grep "index build of " cassandra/logs/* | awk -F" for data in " '{print $1}'
> INFO 13:18:11,252 Submitting index build of addresses.unobfuscatedIndex
> INFO 13:18:11,352 Submitting index build of Inbox.FINNBOXID_INDEX
> INFO 23:03:54,758 Submitting index build of [events.collected_tbIndex, 
> events.real_tbIndex]
> {code}
> To get an idea of successful secondary index rebuilding 
> {code}grep "Index build of "cassandra/logs/*
> INFO 13:18:11,263 Index build of addresses.unobfuscatedIndex complete
> INFO 13:18:11,355 Index build of Inbox.FINNBOXID_INDEX complete
> {code}
> Looking closer at  {{[events.collected_tbIndex, events.real_tbIndex]}} showed 
> the following stacktrace
> {code}
> ERROR [StreamReceiveTask:121] 2015-02-12 05:54:47,768 CassandraDaemon.java 
> (line 199) Exception in thread Thread[StreamReceiveTask:121,5,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>         at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413)
>         at 
> org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142)
>         at 
> org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:130)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>         at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409)
>         ... 7 more
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>         at 
> org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:160)
>         at 
> org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:143)
>         at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:406)
>         at 
> org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:834)
>         ... 5 more
> Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
>         at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>         at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>         at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>         at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>         at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
>         at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
>         at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
>         at 
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:85)
>         at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:88)
>         at 
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:35)
>         at 
> org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:154)
>         ... 9 more
> {code}
> To get past this i had to raise 
> org.apache.cassandra.db:type=StorageService.TombstoneFailureThreshold and 
> manually rebuild the index. Then restart the node with auto_bootstrap=false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to