It seems like this is a choice most users might not know how to make?

On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie <jmcken...@apache.org> wrote:
>
> Have we ever discussed including multiple profiles that are simple to swap 
> between and documented for their tested / intended use cases?
>
> Then the burden of having a “sane” default for the wild variance of workloads 
> people use it for would be somewhat mitigated. Sure, there’s always going to 
> be folks that run the default and never think to change it but the UX could 
> be as simple as a one line config change to swap between GC profiles and we 
> could add and deprecate / remove over time.
>
> Concretely, having config files such as:
>
> jvm11-CMS-write.options
> jvm11-CMS-mixed.options
> jvm11-CMS-read.options
> jvm11-G1.options
> jvm11-ZGC.options
> jvm11-Shen.options
>
>
> Arguably we could take it a step further and not actually allow a C* node to 
> startup without pointing to one of the config files from your primary config, 
> and provide a clean mechanism to integrate that selection on headless 
> installs.
>
> Notably, this could be a terrible idea. But it does seem like we keep butting 
> up against the complexity and mixed pressures of having the One True Way to 
> GC via the default config and the lift to change that.
>
> On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote:
>
> I'm fine with not including G1 in 4.1, but would we consider inclusion
> for 4.1.X down the road once validation has been done?
>
> Derek
>
>
> On Wed, Nov 16, 2022 at 4:39 PM David Capwell <dcapw...@apple.com> wrote:
> >
> > Getting poked in Slack to be more explicit in this thread…
> >
> > Switching to G1 on trunk, +1
> > Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a 
> > bug fix but a perf improvement ticket and as such should go through 
> > validation that the perf improvements are seen, there is not enough time 
> > left for that added performance work burden so strongly feel it should be 
> > pushed to 4.2/5.0 where it has plenty of time to be validated against.  The 
> > ticket even asks to avoid validating the claims; saying 'Hoping we can skip 
> > due diligence on this ticket because the data is "in the past” already”'.  
> > Others have attempted both shenandoah and ZGC and found mixed results, so 
> > nothing leads me to believe that won’t be true here either.
> >
> > > On Nov 16, 2022, at 9:15 AM, J. D. Jordan <jeremiah.jor...@gmail.com> 
> > > wrote:
> > >
> > > Heap -
> > > +1 for G1 in trunk
> > > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
> > > understand pushback against changing this so late in the game.
> > >
> > > Memtable -
> > > -1 for off heap in 4.1. I think this needs more testing and isn’t 
> > > something to change at the last minute.
> > > +1 for running performance/fuzz tests against the alternate memtable 
> > > choices in trunk and switching if they don’t show regressions.
> > >
> > >> On Nov 16, 2022, at 10:48 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> > >>
> > >> 
> > >> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
> > >> prioritize digging into G1's behavior on small heaps vs. CMS w/our 
> > >> default tuning sooner rather than later. With that info I'd likely be a 
> > >> strong +1 on the shift.
> > >>
> > >> -1 on switching to offheap_objects for 4.1 RC; again, think this is just 
> > >> a small step away from being a +1 w/some more rigor around seeing the 
> > >> current state of the technology's intersections.
> > >>
> > >> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> > >>> All right. I’ll clarify then.
> > >>>
> > >>> -0 on switching the default to G1 *this late* just before RC1.
> > >>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for 
> > >>> it in principle, for 4.2, after we run some more test and resolve the 
> > >>> concerns raised by Jeff.
> > >>>
> > >>> Let’s please try to avoid this kind of super late defaults switch going 
> > >>> forward?
> > >>>
> > >>> —
> > >>> AY
> > >>>
> > >>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker <de...@chen-becker.org> 
> > >>> > wrote:
> > >>> >
> > >>> > For the record, I'm +100 on G1. Take it with whatever sized grain of
> > >>> > salt you think appropriate for a relative newcomer to the list, but
> > >>> > I've spent my last 7-8 years dealing with the intersection of
> > >>> > high-throughput, low latency systems and their interaction with GC and
> > >>> > in my personal experience G1 outperforms CMS in all cases and with
> > >>> > significantly less work (zero work, in many cases). The only things
> > >>> > I've seen perform better *with a similar heap footprint* are GenShen
> > >>> > (currently experimental) and Rust (beyond the scope of this topic).
> > >>> >
> > >>> > Derek
> > >>> >
> > >>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad 
> > >>> > <rustyrazorbl...@apache.org> wrote:
> > >>> >>
> > >>> >> I'm curious what it would take for folks to be OK with merging this 
> > >>> >> into 4.1?  How much additional time would you want to feel 
> > >>> >> comfortable?
> > >>> >>
> > >>> >> I should probably have been a little more vigorous in my +1 of 
> > >>> >> Mick's PR.  For a little background - I worked on several hundred 
> > >>> >> clusters while at TLP, mostly dealing with stability and performance 
> > >>> >> issues.  A lot of them stemmed partially or wholly from the GC 
> > >>> >> settings we ship in the project. Par New with CMS and small new gen 
> > >>> >> results in a lot of premature promotion leading to high pause times 
> > >>> >> into the hundreds of ms which pushes p99 latency through the roof.
> > >>> >>
> > >>> >> I'm a big +1 in favor of G1 because it's not just better for most 
> > >>> >> people but it's better for _every_ new Cassandra user.  The first 
> > >>> >> experience that people have with the project is important, and our 
> > >>> >> current GC settings are quite bad - so bad they lead to problems 
> > >>> >> with stability in production.  The G1 settings are mostly hands off, 
> > >>> >> result in shorter pause times and are a big improvement over the 
> > >>> >> status quo.
> > >>> >>
> > >>> >> Most folks don't do GC tuning, they use what we supply, and what we 
> > >>> >> currently supply leads to a poor initial experience with the 
> > >>> >> database.  I think we owe the community our best effort even if it 
> > >>> >> means pushing the release back little bit.
> > >>> >>
> > >>> >> Just for some additional context, we're (Netflix) running 25K nodes 
> > >>> >> on G1 across a variety of hardware in AWS with wildly varying 
> > >>> >> workloads, and I haven't seen G1 be the root cause of a problem even 
> > >>> >> once.  The settings that Mick is proposing are almost identical to 
> > >>> >> what we use (we use half of heap up to 30GB).
> > >>> >>
> > >>> >> I'd really appreciate it if we took a second to consider the 
> > >>> >> community effect of another release that ships settings that cause 
> > >>> >> significant pain for our users.
> > >>> >>
> > >>> >> Jon
> > >>> >>
> > >>> >> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
> > >>> >>>>
> > >>> >>>> In case of GC, reasonably extensive performance testing should be 
> > >>> >>>> the
> > >>> >>>> expectations. Potentially revisiting some of the G1 params for the 
> > >>> >>>> 4.1
> > >>> >>>> reality - quite a lot has changed since those optional defaults 
> > >>> >>>> where
> > >>> >>>> picked.
> > >>> >>>>
> > >>> >>>
> > >>> >>>
> > >>> >>> I've put our battle-tested g1 opts (from consultants at TLP and 
> > >>> >>> DataStax)
> > >>> >>> in the patch for CASSANDRA-18027
> > >>> >>>
> > >>> >>> In reality it is really not much of a change, g1 does make it 
> > >>> >>> simple.
> > >>> >>> Picking the correct ParallelGCThreads and ConcGCThreads and the 
> > >>> >>> floor to
> > >>> >>> the new heap (XX:NewSize) is still required, though we could do a 
> > >>> >>> much
> > >>> >>> better job of dynamic defaults to them.
> > >>> >>>
> > >>> >>> Alex Dejanovski's blog is a starting point:
> > >>> >>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
> > >>> >>> where this gc opt set was used (though it doesn't prove why those 
> > >>> >>> options
> > >>> >>> are chosen)
> > >>> >>>
> > >>> >>> The bar for objection to sneaking these into 4.1 was intended to be 
> > >>> >>> low,
> > >>> >>> and I stand by those that raise concerns.
> > >>> >>>
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > +---------------------------------------------------------------+
> > >>> > | Derek Chen-Becker                                             |
> > >>> > | GPG Key available at https://keybase.io/dchenbecker and       |
> > >>> > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> > >>> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> > >>> > +---------------------------------------------------------------+
> > >>>
> > >>>
> > >>
> >
>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>

Reply via email to