For the record, I'm +100 on G1. Take it with whatever sized grain of
salt you think appropriate for a relative newcomer to the list, but
I've spent my last 7-8 years dealing with the intersection of
high-throughput, low latency systems and their interaction with GC and
in my personal experience G1 outperforms CMS in all cases and with
significantly less work (zero work, in many cases). The only things
I've seen perform better *with a similar heap footprint* are GenShen
(currently experimental) and Rust (beyond the scope of this topic).

Derek

On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad <rustyrazorbl...@apache.org> wrote:
>
> I'm curious what it would take for folks to be OK with merging this into 4.1? 
>  How much additional time would you want to feel comfortable?
>
> I should probably have been a little more vigorous in my +1 of Mick's PR.  
> For a little background - I worked on several hundred clusters while at TLP, 
> mostly dealing with stability and performance issues.  A lot of them stemmed 
> partially or wholly from the GC settings we ship in the project. Par New with 
> CMS and small new gen results in a lot of premature promotion leading to high 
> pause times into the hundreds of ms which pushes p99 latency through the roof.
>
> I'm a big +1 in favor of G1 because it's not just better for most people but 
> it's better for _every_ new Cassandra user.  The first experience that people 
> have with the project is important, and our current GC settings are quite bad 
> - so bad they lead to problems with stability in production.  The G1 settings 
> are mostly hands off, result in shorter pause times and are a big improvement 
> over the status quo.
>
> Most folks don't do GC tuning, they use what we supply, and what we currently 
> supply leads to a poor initial experience with the database.  I think we owe 
> the community our best effort even if it means pushing the release back 
> little bit.
>
> Just for some additional context, we're (Netflix) running 25K nodes on G1 
> across a variety of hardware in AWS with wildly varying workloads, and I 
> haven't seen G1 be the root cause of a problem even once.  The settings that 
> Mick is proposing are almost identical to what we use (we use half of heap up 
> to 30GB).
>
> I'd really appreciate it if we took a second to consider the community effect 
> of another release that ships settings that cause significant pain for our 
> users.
>
> Jon
>
> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
> > >
> > > In case of GC, reasonably extensive performance testing should be the
> > > expectations. Potentially revisiting some of the G1 params for the 4.1
> > > reality - quite a lot has changed since those optional defaults where
> > > picked.
> > >
> >
> >
> > I've put our battle-tested g1 opts (from consultants at TLP and DataStax)
> > in the patch for CASSANDRA-18027
> >
> > In reality it is really not much of a change, g1 does make it simple.
> > Picking the correct ParallelGCThreads and ConcGCThreads and the floor to
> > the new heap (XX:NewSize) is still required, though we could do a much
> > better job of dynamic defaults to them.
> >
> > Alex Dejanovski's blog is a starting point:
> > https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
> > where this gc opt set was used (though it doesn't prove why those options
> > are chosen)
> >
> > The bar for objection to sneaking these into 4.1 was intended to be low,
> > and I stand by those that raise concerns.
> >



-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Reply via email to