[jira] [Commented] (CASSANDRA-12655) Incremental repair & compaction hang on random nodes

Navjyot Nishant (JIRA) Fri, 16 Sep 2016 06:25:02 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496334#comment-15496334
 ]


Navjyot Nishant commented on CASSANDRA-12655:
---------------------------------------------

Hi Marcus,

Is that the only solution? This is the issues we are getting in our production 
environment and upgrade will not be straight forward and quick. We will have to 
follow several process and take sign off from several stake holders in order to 
move forward with that and also we will have to explain the reason of upgrade 
in detail.. We have anyways a plan to move on version 3 early next year.

In the meanwhile if we could fix the issue by keeping same version that would 
be great. Would you please explain about the root cause or guide me to the 
actual issue which is driving this? we are clue less we are not sure what we 
are supposed to check and where?

> Incremental repair & compaction hang on random nodes
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12655
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12655
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: CentOS Linux release 7.1.1503 (Core)
> RAM - 64GB
> HEAP - 16GB
> Load on each node - ~5GB
> Cassandra Version - 2.2.5
>            Reporter: Navjyot Nishant
>            Priority: Blocker
>
> Hi We are setting up incremental repair on our 18 node cluster. Avg load on 
> each node is ~5GB. The repair run fine on couple of nodes and sudently get 
> stuck on random nodes. Upon checking the system.log of impacted node we dont 
> see much information.
> Following are the lines we see in system.log and its there from the point 
> repair is not making progress -
> {code}
> INFO  [CompactionExecutor:3490] 2016-09-16 11:14:44,236 
> CompactionManager.java:1221 - Anticompacting 
> [BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')]
> INFO  [IndexSummaryManager:1] 2016-09-16 11:14:49,954 
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2016-09-16 12:14:49,961 
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> {code}
> When we try to see pending compaction by executing {code}nodetool 
> compactionstats{code} it hangs as well and doesn't return anything. However 
> {code}nodetool tpstats{code} show active and pending compaction which never 
> come down and keep increasing. 
> {code}
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0         221208         0        
>          0
> ReadStage                         0         0        1288839         0        
>          0
> RequestResponseStage              0         0         104356         0        
>          0
> ReadRepairStage                   0         0             72         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> HintedHandoff                     0         0             46         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> CompactionExecutor                8        66          68124         0        
>          0
> MemtableReclaimMemory             0         0            166         0        
>          0
> PendingRangeCalculator            0         0             38         0        
>          0
> GossipStage                       0         0         242455         0        
>          0
> MigrationStage                    0         0              0         0        
>          0
> MemtablePostFlush                 0         0           3682         0        
>          0
> ValidationExecutor                0         0           2246         0        
>          0
> Sampler                           0         0              0         0        
>          0
> MemtableFlushWriter               0         0            166         0        
>          0
> InternalResponseStage             0         0           8866         0        
>          0
> AntiEntropyStage                  0         0          15417         0        
>          0
> Repair#7                          0         0            160         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> Native-Transport-Requests         0         0         327334         0        
>          0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {code}
> The only solution we have is bounce the node and all the pending compactions 
> started getting processed immediately and get processed in 5 - 10 minutes.
> This is a road blocker issue for us and and help in this matter would be 
> highly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12655) Incremental repair & compaction hang on random nodes

Reply via email to