It's not that your disks are getting full. I suspect you don't have enough throughput to handle the type of stress compaction and memtable flushing produce. Blocked flush writers is almost always a disk problem.
Any storage with the words SAN, NAS, NFS or SATA in them, is going to make your life miserable. Avoid them at all costs. Take a look at this doc and scan down to the section on disks: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html Patrick On Fri, Aug 26, 2016 at 9:07 AM, Vasileios Vlachos < vasileiosvlac...@gmail.com> wrote: > Hi Patrick and thanks for your reply, > > We are monitoring disk usage and more and we don't seem to be running out > of space at the moment. We have separate partitions/disks for > commitlog/data. Which one do you suspect and why? > > Regards, > Vasilis > > On 25 Aug 2016 4:01 pm, "Patrick McFadin" <pmcfa...@gmail.com> wrote: > > This looks like you've run out of disk. What are your hardware specs? > > Patrick > > > On Thursday, August 25, 2016, Benedict Elliott Smith <bened...@apache.org> > wrote: > >> You should update from 2.0 to avoid this behaviour, is the simple >> answer. You are correct that when the commit log gets full the memtables >> are flushed to make room. 2.0 has several interrelated problems here >> though: >> >> There is a maximum flush queue length property (I cannot recall its >> name), and once there are this many memtables flushing, no more writes can >> take place on the box, whatsoever. You cannot simply increase this length, >> though, because that shrinks the maximum size of any single memtable (it >> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)), >> which worsens write-amplification from compaction. >> >> This is because the memory management for memtables in 2.0 was really >> terrible, and this queue length was used to try to ensure the space >> allocated was not exceeded. >> >> Compounding this, when clearing the commit log 2.0 will flush all >> memtables with data in them regardless of it is useful to do so, meaning >> having more tables (that are actively written to) than your max queue >> length will necessarily cause stalls every time you run out of commit log >> space. >> >> In 2.1, none of these concerns apply. >> >> >> On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlac...@gmail.com >> > wrote: >> >>> Hello, >>> >>> >>> >>> >>> >>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We >>> run C* 2.0.17 on Ubuntu 12.04 at the moment. >>> >>> >>> >>> >>> Our C# application often throws logs, which correlate with dropped >>> messages (counter mutations usually) in our logs. We think that if a >>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra >>> drops it. This is also suggested by these lines in system.log: >>> >>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java >>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal >>> timeout and 0 for cross node timeout >>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java >>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for >>> internal timeout and 0 for cross node timeout >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line >>> 55) Pool Name Active Pending Completed Blocked >>> All Time Blocked >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line >>> 70) ReadStage 0 0 245177190 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line >>> 70) RequestResponseStage 0 0 3530334509 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line >>> 70) ReadRepairStage 0 0 1549567 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line >>> 70) MutationStage 48 1380 2540965500 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line >>> 70) ReplicateOnWriteStage 0 0 189615571 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line >>> 70) GossipStage 0 0 20586077 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line >>> 70) CacheCleanupExecutor 0 0 0 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line >>> 70) MigrationStage 0 0 106 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line >>> 70) MemoryMeter 0 0 303029 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line >>> 70) ValidationExecutor 0 0 0 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line >>> 70) FlushWriter 1 5 322604 1 >>> 8227 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line >>> 70) InternalResponseStage 0 0 35 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line >>> 70) AntiEntropyStage 0 0 0 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line >>> 70) MemtablePostFlusher 1 5 424104 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line >>> 70) MiscStage 0 0 0 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line >>> 70) PendingRangeCalculator 0 0 37 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line >>> 70) commitlog_archiver 0 0 0 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line >>> 70) CompactionExecutor 4 4 5144499 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line >>> 70) HintedHandoff 0 0 3194 0 >>> 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line >>> 79) CompactionManager 1 4 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line >>> 81) Commitlog n/a 0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line >>> 93) MessagingService n/a 0/0 >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line >>> 103) Cache Type Size Capacity >>> KeysToSave >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line >>> 105) KeyCache 104828280 104857600 >>> all >>> INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line >>> 111) RowCache 0 0 >>> all >>> >>> >>> >>> So far we have noticed that when our application throws logs, Cassandra >>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above). >>> Just before the tpstats-like output that appears in system.log we seem to >>> have consecutive flushes which take anywhere between 7-20 seconds, like >>> shown below: >>> >>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/LIVESt >>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031 >>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=225) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/LIVESt >>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db >>> (893342 bytes) for commitlog position >>> ReplayPosition(segmentId=1465909831301, >>> position=225) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/LIVESt >>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862 >>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=225) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/Organa >>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes) >>> for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=225) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/LIVESt >>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for >>> commitlog position ReplayPosition(segmentId=1465909831301, >>> position=8363) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/system >>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db >>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=226441) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/DataMi >>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db >>> (18754778 bytes) for commitlog position >>> ReplayPosition(segmentId=1465909831301, >>> position=325555) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/system >>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db >>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=429568) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/system >>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db >>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=430537) >>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line >>> 402) Completed flushing /var/lib/cassandra/data/system >>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db >>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301, >>> position=433145) >>> >>> Even though no individual memtable seems to be taking a very long time, >>> the commitlog segment id for the flushes shown above is the same. We think >>> that this happens because the commitlog reached its limit and is rotating >>> this particular segment, which triggers flush activity for all memtables >>> referenced in this segment. >>> >>> Our questions: >>> >>> >>> >>> >>> >>> 1. Will writes be blocked for the tables listed in the above FlushWriter >>> output while this commitlog segment is being rotated? If so, will the >>> blocking last for the entire duration or just for the time taken for the >>> respective memtable to be flushed to disk? >>> >>> 2. We have also noticed that flushes can also be triggered by >>> MemoryMeter (probably when occupancy fraction reaches the specified limit, >>> at which point the largest memtable is flushed to disk). Would it be wise >>> to increase the commitlog total size, so that MemoryMeter is the main >>> reason for flush activity? When flushes are triggered due to "memory >>> pressure" we never seem to through logs in our application. >>> >>> 3. Is there a recommended value for the ratio (total size of Commitlog >>> segments)/(total size of Memtables)? We use the defaults for both values >>> which makes the total size for Commitlog segments 1G (32 files of 32M) and >>> the total size used for Memtables 2G, since we have allocated 8G in total >>> for Heap. >>> >>> 4. The total space for all memtables seems to be Heap/4; this is 2G for >>> us. Is there a maximum size for an individual memtable? >>> >>> 5. What is the max queue size for MemtablePostFlusher and exactly >>> happens after a flush (which I pressume relates to MemtablePostFlusher)? Is >>> it concerning that we also have five pending tasks there? >>> >>> Many thanks, >>> >>> >>> Vasilis >>> >> >> >