[ 
https://issues.apache.org/jira/browse/CASSANDRA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569535#comment-14569535
 ] 

Benedict commented on CASSANDRA-9533:
-------------------------------------

bq. matches the comment *less* well 

You're being a bit selective in which parts you bold :)

 "it will wait up to" implies _it will wait_ - which it would not, at all. The 
reference to Postgres' behaviour also indicates it will actually wait that 
period (although with separate sibling requirements, and on a microsecond time 
horizon which is much more sensible). Further, our docs say "To avoid syncing 
after every write, Cassandra groups the mutations into batches and *syncs 
every* {{commitlog_batch_window_in_ms.}}" Not at least that often, but - 
implcitly - exactly that often, as we do now.

bq.  I thought we did it that way because we don't have a queue of operations 
to peek into anymore, so it's difficult to provide the old behavior of "stop 
sleeping when the queue is empty."

No, unfortunately the best I can find of the etymology of this change is some 
offline discussion between Brandon, Jeremiah and myself, which occurred 2-3 
months after commit:

{quote}
Benedict Elliott Smith   so, just figured out why that CL unit test @yukim 
found went from hero to zero in 2.1
Benedict Elliott Smith   SHANP on IRC has found the old Batch CL code doesn't 
behave in the same way
Benedict Elliott Smith   the window only serves as a maximum for buffering
Benedict Elliott Smith   so if you get one record arrive, it will immediately 
sync
Benedict Elliott Smith   in 2.1 this changed. i'm not sure if I got the wrong 
end of the stick (and it seems everyone else who's been discussing it since, 
maybe), or if this is a mistake that's been present all along
Benedict Elliott Smith   but we should probably decide which behaviour we want 
to go with in 2.1
dr driftx        I don't think it was a mistake, exactly, I've explained it 
lots of times in training that way (pre-2.1 behavior)
Benedict Elliott Smith   right
Benedict Elliott Smith   so question is: did I just misunderstand, or did 
somebody tell me to implement it this way? and ignoring the answer, do we want 
to restore the old behaviour?
Benedict Elliott Smith   "To avoid syncing after every write, Cassandra groups 
the mutations into batches and syncs every {{commitlog_batch_window_in_ms. }}"
Benedict Elliott Smith   anyway, i'm easy, and heading to bed. if nobody says 
anything i'll leave it how it is :-)
ZJeremiah D Jordan       As long as the window time is the max time between 
syncs and writes block until they sync. I think your code is fine. As that is 
how we have documented it working all along
Benedict Elliott Smith  @jd my version has it as the the exact time between 
syncs
{quote}

So it is not at all clear why it happened. My recollection was that we 
discussed it and decided to normalise on the doc behaviour, but I cannot find a 
reference to that, so it is possible I simply implemented it how the docs 
described it, and you reviewed it with the same lens. Either way, it was 
decided to let the change stand due to it matching the doc descriptions and the 
lack of further feedback.

We can certainly wait for the "queue" to empty, by issuing Barriers on the 
OpOrder; we already issue one and wait for any active at that moment to 
complete, which is probably good enough. If we want to wait until _none_ can be 
active, we can just (on completion of the barrier), check if there are any now 
running and issue another if there are.

> Make batch commitlog mode easier to tune
> ----------------------------------------
>
>                 Key: CASSANDRA-9533
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9533
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>             Fix For: 3.x
>
>
> As discussed in CASSANDRA-9504, 2.1 changed commitlog_sync_batch_window_in_ms 
> from a maximum time to wait between fsync to the minimum time, so one must be 
> very careful to keep it small enough that most writers aren't kept waiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to