Navjyot Nishant created CASSANDRA-12655:
-------------------------------------------

             Summary: Incremental repair & compaction hang on random nodes
                 Key: CASSANDRA-12655
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12655
             Project: Cassandra
          Issue Type: Bug
          Components: Compaction
         Environment: CentOS Linux release 7.1.1503 (Core)
RAM - 64GB
HEAP - 16GB
Load on each node - ~5GB
Cassandra Version - 2.2.5
            Reporter: Navjyot Nishant
            Priority: Blocker


Hi We are setting up incremental repair on our 18 node cluster. Avg load on 
each node is ~5GB. The repair run fine on couple of nodes and sudently get 
stuck on random nodes. Upon checking the system.log of impacted node we dont 
see much information.

Following are the lines which stick from the point repair is not making 
progress -

{code}
INFO  [CompactionExecutor:3490] 2016-09-16 11:14:44,236 
CompactionManager.java:1221 - Anticompacting 
[BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'),
 
BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')]
INFO  [IndexSummaryManager:1] 2016-09-16 11:14:49,954 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2016-09-16 12:14:49,961 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
{code}
When we try to see pending compaction by executing {code}nodetool 
compactionstats{code} it hangs as well and doesn't return anything. However 
{code}nodetool tpstats{code} show active and pending compaction which never 
come down and keep increasing. 

{code}
Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked
MutationStage                     0         0         221208         0          
       0
ReadStage                         0         0        1288839         0          
       0
RequestResponseStage              0         0         104356         0          
       0
ReadRepairStage                   0         0             72         0          
       0
CounterMutationStage              0         0              0         0          
       0
HintedHandoff                     0         0             46         0          
       0
MiscStage                         0         0              0         0          
       0
CompactionExecutor                8        66          68124         0          
       0
MemtableReclaimMemory             0         0            166         0          
       0
PendingRangeCalculator            0         0             38         0          
       0
GossipStage                       0         0         242455         0          
       0
MigrationStage                    0         0              0         0          
       0
MemtablePostFlush                 0         0           3682         0          
       0
ValidationExecutor                0         0           2246         0          
       0
Sampler                           0         0              0         0          
       0
MemtableFlushWriter               0         0            166         0          
       0
InternalResponseStage             0         0           8866         0          
       0
AntiEntropyStage                  0         0          15417         0          
       0
Repair#7                          0         0            160         0          
       0
CacheCleanupExecutor              0         0              0         0          
       0
Native-Transport-Requests         0         0         327334         0          
       0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

{code}
The only solution we have is bounce the node and all the pending compactions 
started getting processed immediately and get processed in 5 - 10 minutes.

This is a road blocker issue for us and and help in this matter would be highly 
appreciated.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to