Hi Benedict,

This makes sense now. Thank you very much for your input.

Regards,
Vasilis

On 25 Aug 2016 10:30 am, "Benedict Elliott Smith" <bened...@apache.org>
wrote:

> You should update from 2.0 to avoid this behaviour, is the simple answer.
> You are correct that when the commit log gets full the memtables are
> flushed to make room.  2.0 has several interrelated problems here though:
>
> There is a maximum flush queue length property (I cannot recall its name),
> and once there are this many memtables flushing, no more writes can take
> place on the box, whatsoever.  You cannot simply increase this length,
> though, because that shrinks the maximum size of any single memtable (it
> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
> which worsens write-amplification from compaction.
>
> This is because the memory management for memtables in 2.0 was really
> terrible, and this queue length was used to try to ensure the space
> allocated was not exceeded.
>
> Compounding this, when clearing the commit log 2.0 will flush all
> memtables with data in them regardless of it is useful to do so, meaning
> having more tables (that are actively written to) than your max queue
> length will necessarily cause stalls every time you run out of commit log
> space.
>
> In 2.1, none of these concerns apply.
>
>
> On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlac...@gmail.com>
> wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>
>>
>>
>>
>> Our C# application often throws logs, which correlate with dropped
>> messages (counter mutations usually) in our logs. We think that if a
>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>> drops it. This is also suggested by these lines in system.log:
>>
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>> timeout and 0 for cross node timeout
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>> internal timeout and 0 for cross node timeout
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>  All Time Blocked
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) ReadStage                         0         0      245177190         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) RequestResponseStage              0         0     3530334509         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReadRepairStage                   0         0        1549567         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) MutationStage                    48      1380     2540965500         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) GossipStage                       0         0       20586077         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) CacheCleanupExecutor              0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MigrationStage                    0         0            106         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MemoryMeter                       0         0         303029         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) ValidationExecutor                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) FlushWriter                       1         5         322604         1
>>              8227
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) InternalResponseStage             0         0             35         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) AntiEntropyStage                  0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MemtablePostFlusher               1         5         424104         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MiscStage                         0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) PendingRangeCalculator            0         0             37         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) commitlog_archiver                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) CompactionExecutor                4         4        5144499         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) HintedHandoff                     0         0           3194         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 79) CompactionManager                 1         4
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 81) Commitlog                       n/a         0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 93) MessagingService                n/a       0/0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 103) Cache Type                     Size                 Capacity
>>     KeysToSave
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 105) KeyCache                  104828280                104857600
>>            all
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 111) RowCache                          0                        0
>>            all
>>
>>
>>
>> So far we have noticed that when our application throws logs, Cassandra
>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>> Just before the tpstats-like output that appears in system.log we seem to
>> have consecutive flushes which take anywhere between 7-20 seconds, like
>> shown below:
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/Organa
>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>> for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for commitlog
>> position ReplayPosition(segmentId=1465909831301, position=8363)
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=226441)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/DataMi
>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>> (18754778 bytes) for commitlog position 
>> ReplayPosition(segmentId=1465909831301,
>> position=325555)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=429568)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=430537)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=433145)
>>
>> Even though no individual memtable seems to be taking a very long time,
>> the commitlog segment id for the flushes shown above is the same. We think
>> that this happens because the commitlog reached its limit and is rotating
>> this particular segment, which triggers flush activity for all memtables
>> referenced in this segment.
>>
>> Our questions:
>>
>>
>>
>>
>>
>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>> output while this commitlog segment is being rotated? If so, will the
>> blocking last for the entire duration or just for the time taken for the
>> respective memtable to be flushed to disk?
>>
>> 2. We have also noticed that flushes can also be triggered by MemoryMeter
>> (probably when occupancy fraction reaches the specified limit, at which
>> point the largest memtable is flushed to disk). Would it be wise to
>> increase the commitlog total size, so that MemoryMeter is the main reason
>> for flush activity? When flushes are triggered due to "memory pressure" we
>> never seem to through logs in our application.
>>
>> 3. Is there a recommended value for the ratio (total size of Commitlog
>> segments)/(total size of Memtables)? We use the defaults for both values
>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>> the total size used for Memtables 2G, since we have allocated 8G in total
>> for Heap.
>>
>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>> us. Is there a maximum size for an individual memtable?
>>
>> 5. What is the max queue size for MemtablePostFlusher and exactly happens
>> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
>> concerning that we also have five pending tasks there?
>>
>> Many thanks,
>>
>>
>> Vasilis
>>
>
>

Reply via email to