FYI ---------- Forwarded message ---------- From: Kevin Burton <burtona...@gmail.com> Date: Sat, Sep 21, 2013 at 9:30 AM Subject: Re: JVMs on single cores but parallel JVMs. To: mechanical-sympa...@googlegroups.com
ok... so I'll rephrase this a bit. You're essentially saying that GC and background threads will need to run to prevent foreground threads from stalling. GC , network IO, background filesystem log flushes, etc. ... and if you're only running on ONE core this will preempt your active threads and will increase latency. And I guess the theory is that if you have another core free, why not just let that other core step in and help split the load so you can have a smaller "stop the world" interval. I guess that makes some sense and probably applies to a lot of workloads. Some points: - in our load, we are usually about 100% CPU on the current thread, and 100% on the other CPU... so if we trigger GC in the core, the secondary core isn't going to necessarily execute faster. In fact it might execute slower due to memory locality (depending on the configuration). I think in most situations, applications are over-provisioned to account for load spikes so this setup might actually warrant deployment as it would work in practice. - This idea is partially a distributed computing fallacy. This GC doesn't scale to hundreds of cores...If you're on a 64 core machine splitting out your VMs so they are smaller, with the entire working set local to that CPU, and segmenting GC to that core, seems to make the most sense. You would have GC pauses but they would be 1/Nth (where N = num of cores) of your entire application GC hit. - You can still use a CMS approach here where you GC in the background, it's just done on one core with another thread. - GC isn't infinitely parallel... You aren't going to send part of your heap over the network and do a map/reduce style GC across 1024 servers within a cluster. Data locality is important. Keeping the JVMs small and local to the core and having lots of them seems to make a lot of sense. - the fewer JVMs you have the more JDK lock contention you can have. Things like SSL are still contended (yuk) ... though JDK 1.7 has definitely improved the situation. ... one issue of course is that OpenJDK doesn't share the permanent generation classes. So you see like a 128MB hit per JVM. This works out to about $2 per month per JVM for us so not really the end of the world. Kevin On Saturday, September 21, 2013 8:39:33 AM UTC-7, Gil Tene wrote: > > Back to the original topic, running enough JVMs such that there is only 1 > core per JVM is not a good idea unless you can accept regular > multi-tens-of-msec pauses even when no GC is occurring in your JVM. I'd > recommend sizing for *at least* 2-3 cores per JVM unless you find those > sort of glitches acceptable. > > The reasoning is: > > [assuming you are not isolating JVMs to dedicated cores that have nothing > else running in them, whic has its own obvious problems] > > GC: > even if you limit GC to using one thread, that one GC thread can be > running concurrently with your actual application threads for long periods > of time (e.g. during marking and sweeping in CMS, or during G1 marking). If > there was only one core per JVM, then when any one JVM is active in GC at > least one other JVM's application threads will have entire scheduling > quantums stolen from it. > > Before people start thinking "this will be rare", let me point out that > with many JVMs some GC is more likely to be active at any given time. E.g. > If you ran 12 JVMs on a 12 vcore machine, and each JVM had a very > reasonable 2% duty cycle (not necessarily pause time, but time in GC cycle > compared to time when no GC is active) then there would be some sort of > quantum-stealing-from-**application-threads GC activity going on roughly > 25% of the wall clock time even if GCs were perfectly interleaved (which > they won't be), and if they weren't perfectly interleaved there would be > multiple of those going on. Under full load, such a setup will translate > into a 98%-99%'ile that is at least as large as an OS scheduling quantum, > and under lower loads those quantum-level hiccups will only move slightly > higher I percentiles (e.g. even at only 5%-10% load your 99.9% will still > be around a 10msec). > > Other JVM stuff: > The JVM has other, non-GC work that it uses JVM threads for. E.g. JIT > compilation will cause storms of compiler activity that runs concurrently > with the app. While GC does tend to dominate over time, limiting GC threads > to 1 does not cap the number of concurrent, non-application thread work > that the JVM does. > > Application threads: > Unless your Java application is purely single threaded, there will be > bursts of time where one JVM has multiple runnable application threads > active. Whenever those occur when there is only one-core-per-JVZm sizing, > application threads across JVM will be contending for cores and > scheduling-quantum-sized delays will be incurred. > > Bottom line: if you never want to see quantum-sized delays in your apps, > you need as many cores in the system as the total possible concurrently > runnable threads across all JVMs (app threads plus GC threads). If you are > willing to occasionally experience quantum-level delays, you can relax that > a bit, but be aware that the higher the load on your system us, the higher > your cross-JVM effects will start mounting, and that even at relatively low > average loads (5-10%) you will start seeing very frequent glitches in the > tens of msec. > > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.