Joshua McKenzie created CASSANDRA-9634:
------------------------------------------

             Summary: Set kernel timer resolution on Windows
                 Key: CASSANDRA-9634
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9634
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Joshua McKenzie
            Assignee: Joshua McKenzie
             Fix For: 2.2.x


In Windows 7 and to a similar extent Windows 8, the kernel's internal time is 
set to an interval of 15.6ms. (Use 
[clockres|https://technet.microsoft.com/en-us/sysinternals/bb897568.aspx] to 
confirm current 'tick rate' on Windows). Win8/Server2012 have a tickless kernel 
w/timer coalescing ([info 
here|http://arstechnica.com/information-technology/2012/10/better-on-the-inside-under-the-hood-of-windows-8/2/])
 and the platform shows similar performance characteristics with C* to Windows 
7 with a slight edge in performance to win8/server 2012 in my testing (the 
testing and results of which are outside the scope of this ticket).

Some arguments against lowering the system's internal timer to 1ms are 
[here|https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/].
 These seem largely constrained to "it'll drain your battery" and "it'll 
prevent your processor from being as effective in sleep states". The 2nd is 
somewhat of a concern as we don't want Windows users to all of a sudden have 
increased CPU-usage bills from virtualized environments. In the comments, one 
individual mentions a VirtualBox VM spinning at 10-20% cpu just from changing 
that flag alone which seems mathematically unlikely, but is worth keeping an 
eye on and testing.

A Microsoft publication that largely reinforces the cautionary tale on power 
consumption can be found 
[here|http://download.microsoft.com/download/3/0/2/3027D574-C433-412A-A8B6-5E0A75D5B237/Timer-Resolution.docx].

With the cautionary tales on our rader, the impact on throughput and latency on 
the 2.2 branch on Windows is [fairly 
dramatic|https://docs.google.com/spreadsheets/d/1nqPhNwOVt0SU7b9lt9o4Tyl0Z1yDrV2oo7LbBPaFa6A/edit#gid=0].
 A couple of caveats on these #'s: I'm not completely saturating the system as 
the thread count is relatively low (keeping it consistent with other testing 
where it *was* saturating), and the read #'s from our 2012 test environment are 
not affected by this timer change while I see it on 3 other bare-metal 
installations. The testing environment is new and we haven't worked out the 
kinks yet, however the write / mixed illustrate the throughput and latency #'s 
I've mentioned above; for reads the cpu's are sitting idle at 1-5% used by 
stress and C* so something else clearly needs to be addressed there; I included 
them for completeness sake.

Some preliminary testing on OpenStack indicates kernel-space syscall saturation 
w/this patch that actually *degrades* performance, however the unpatched 
performance numbers in our OpenStack environment are low enough that I question 
their validity.

Opening this ticket w/attached branch to get it on the radar / conversation 
going, and I'm going to update this from being hard-coded to being a tunable in 
the .yaml.

Initial patch [available 
here|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:2.2_WinTimer].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to