Joshua McKenzie created CASSANDRA-9634:
------------------------------------------
Summary: Set kernel timer resolution on Windows
Key: CASSANDRA-9634
URL: https://issues.apache.org/jira/browse/CASSANDRA-9634
Project: Cassandra
Issue Type: Improvement
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
Fix For: 2.2.x
In Windows 7 and to a similar extent Windows 8, the kernel's internal time is
set to an interval of 15.6ms. (Use
[clockres|https://technet.microsoft.com/en-us/sysinternals/bb897568.aspx] to
confirm current 'tick rate' on Windows). Win8/Server2012 have a tickless kernel
w/timer coalescing ([info
here|http://arstechnica.com/information-technology/2012/10/better-on-the-inside-under-the-hood-of-windows-8/2/])
and the platform shows similar performance characteristics with C* to Windows
7 with a slight edge in performance to win8/server 2012 in my testing (the
testing and results of which are outside the scope of this ticket).
Some arguments against lowering the system's internal timer to 1ms are
[here|https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/].
These seem largely constrained to "it'll drain your battery" and "it'll
prevent your processor from being as effective in sleep states". The 2nd is
somewhat of a concern as we don't want Windows users to all of a sudden have
increased CPU-usage bills from virtualized environments. In the comments, one
individual mentions a VirtualBox VM spinning at 10-20% cpu just from changing
that flag alone which seems mathematically unlikely, but is worth keeping an
eye on and testing.
A Microsoft publication that largely reinforces the cautionary tale on power
consumption can be found
[here|http://download.microsoft.com/download/3/0/2/3027D574-C433-412A-A8B6-5E0A75D5B237/Timer-Resolution.docx].
With the cautionary tales on our rader, the impact on throughput and latency on
the 2.2 branch on Windows is [fairly
dramatic|https://docs.google.com/spreadsheets/d/1nqPhNwOVt0SU7b9lt9o4Tyl0Z1yDrV2oo7LbBPaFa6A/edit#gid=0].
A couple of caveats on these #'s: I'm not completely saturating the system as
the thread count is relatively low (keeping it consistent with other testing
where it *was* saturating), and the read #'s from our 2012 test environment are
not affected by this timer change while I see it on 3 other bare-metal
installations. The testing environment is new and we haven't worked out the
kinks yet, however the write / mixed illustrate the throughput and latency #'s
I've mentioned above; for reads the cpu's are sitting idle at 1-5% used by
stress and C* so something else clearly needs to be addressed there; I included
them for completeness sake.
Some preliminary testing on OpenStack indicates kernel-space syscall saturation
w/this patch that actually *degrades* performance, however the unpatched
performance numbers in our OpenStack environment are low enough that I question
their validity.
Opening this ticket w/attached branch to get it on the radar / conversation
going, and I'm going to update this from being hard-coded to being a tunable in
the .yaml.
Initial patch [available
here|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:2.2_WinTimer].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)