[
https://issues.apache.org/jira/browse/CASSANDRA-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496341#comment-15496341
]
Marcus Eriksson commented on CASSANDRA-12655:
---------------------------------------------
have a look here:
https://github.com/apache/cassandra/blob/cassandra-2.2/CHANGES.txt - all issues
related should be listed there
> Incremental repair & compaction hang on random nodes
> ----------------------------------------------------
>
> Key: CASSANDRA-12655
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12655
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction
> Environment: CentOS Linux release 7.1.1503 (Core)
> RAM - 64GB
> HEAP - 16GB
> Load on each node - ~5GB
> Cassandra Version - 2.2.5
> Reporter: Navjyot Nishant
> Priority: Blocker
>
> Hi We are setting up incremental repair on our 18 node cluster. Avg load on
> each node is ~5GB. The repair run fine on couple of nodes and sudently get
> stuck on random nodes. Upon checking the system.log of impacted node we dont
> see much information.
> Following are the lines we see in system.log and its there from the point
> repair is not making progress -
> {code}
> INFO [CompactionExecutor:3490] 2016-09-16 11:14:44,236
> CompactionManager.java:1221 - Anticompacting
> [BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'),
>
> BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')]
> INFO [IndexSummaryManager:1] 2016-09-16 11:14:49,954
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO [IndexSummaryManager:1] 2016-09-16 12:14:49,961
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> {code}
> When we try to see pending compaction by executing {code}nodetool
> compactionstats{code} it hangs as well and doesn't return anything. However
> {code}nodetool tpstats{code} show active and pending compaction which never
> come down and keep increasing.
> {code}
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 0 221208 0
> 0
> ReadStage 0 0 1288839 0
> 0
> RequestResponseStage 0 0 104356 0
> 0
> ReadRepairStage 0 0 72 0
> 0
> CounterMutationStage 0 0 0 0
> 0
> HintedHandoff 0 0 46 0
> 0
> MiscStage 0 0 0 0
> 0
> CompactionExecutor 8 66 68124 0
> 0
> MemtableReclaimMemory 0 0 166 0
> 0
> PendingRangeCalculator 0 0 38 0
> 0
> GossipStage 0 0 242455 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemtablePostFlush 0 0 3682 0
> 0
> ValidationExecutor 0 0 2246 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 0 0 166 0
> 0
> InternalResponseStage 0 0 8866 0
> 0
> AntiEntropyStage 0 0 15417 0
> 0
> Repair#7 0 0 160 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> Native-Transport-Requests 0 0 327334 0
> 0
> Message type Dropped
> READ 0
> RANGE_SLICE 0
> _TRACE 0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE 0
> READ_REPAIR 0
> {code}
> The only solution we have is bounce the node and all the pending compactions
> started getting processed immediately and get processed in 5 - 10 minutes.
> This is a road blocker issue for us and and help in this matter would be
> highly appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)