Re: [DISCUSS] CASSSIDECAR-254 - Enabling sidecar to collect async profiles

Jon Haddad Mon, 23 Jun 2025 13:07:35 -0700

easy-cass-lab has a few shell functions that I use often  they're defined
as c-flame* [1]


The arguments I've found most useful have been -X for excluding Parked
threads, -I for narrowing scope to particular callstacks (compaction), -e
for switching between allocation / cpu / wall profiling, -o for different
formats, but when I look at the profiler [2] options, I find I've used
almost all of them at one point or another...

I haven't looked at the full history of the async profiler project, so I
might be mistaken, but I can't remember a time where a change was made that
wasn't backwards compatible.  I have a hard time thinking of how or why
they'd opt to do that.  For example, would they ever remove -X, which
allows you to remove frames that match a regex?  Seems unlikely.  Every
option in there that I've used has been critical in some form or another in
doing a performance analysis.  I am deeply skeptical that we'd ever
actually encounter this problem.

I do appreciate the forward thinking here but I want to just caution
against putting a solution in place to solve a theoretical problem that
might never exist, and have that solution introduce problems of it's
own.  For a concrete example:

- Let's say we had starting including the profiler with C* 5.0.  At the
time of release, v4 of asprof wasn't available.
- V4 is released with an awesome new feature, continuous profiling.
- Now I want to upgrade asprof and drop in a new jar with C* 5.0
- Without the escape hatch, now I have to go back to maintaining my own
tools

TL;DR: I think it's important that we support users being able to upgrade
asprof independently from C*.

To the subject of disabling it by default, I guess I'm -0 on that right
now, but that's not an opinion I hold strongly, and if you think there's a
good case for it, I'm not going to spend any time trying to convince you
otherwise :)

Jon

[1]
https://github.com/rustyrazorblade/easy-cass-lab/blob/5d4874bbdbaadcf6e33651e19d8332c8c9383961/src/main/resources/com/rustyrazorblade/easycasslab/configuration/env.sh#L46


[2]
https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilerOptions.md


On Mon, Jun 23, 2025 at 8:11 AM Doug Rohrer <droh...@apple.com> wrote:

> A few thoughts here:
>
> 1) Run-time configuration (or even compile-time inclusion/exclusion?) that
> allows you to enable/disable the “raw” mode in both Cassandra and Sidecar
> would be a reasonable middle-ground here. I’m not crazy about exposing it
> by default though, so I’d a minimum have it default to disabled.
> 2) Different APIs exposed via `nodetool` (where you generally already have
> node-local access) and Sidecar, with different levels of complexity.
> 3) The more input we get from “users who actually use the profiler today,”
> the better the “safe” API can be, so maybe you won’t (generally) need to
> deploy with the raw endpoint enabled. To Jon specifically,  do you have in
> easy-cass-lab or anywhere else examples of how you’re using the profiler
> today that we could use to help guide the API design? I know you’ve got
> plenty to do so if there’s something we can dig into without requiring you
> to do it yourself I’d be happy to try to dig out requirements from there.
>
>
> Doug
>
>
> On Jun 22, 2025, at 8:10 PM, Josh McKenzie <jmcken...@apache.org> wrote:
>
> If sidecar wishes to expose exec and take the fact that this API could
> break on it, I am +0 to that.  I mostly am trying to highlight the risk
>
> Trying to disambiguate here.
>
> "This API": we referring to our friendly simple exposed subset? Or are we
> referring to "you passed --raw and whatever is parsing that could drift."
>
> The former we have control over. The latter not so much.
>
> I'm +0 to taking on (and breaking) the latter; we either allow power users
> to pass arg strings directly and stay frozen if the API in the profiler
> changes, or we just rev the profile dep as needed and let power users eat
> the re-architecting costs. In my head: they're power users. They can update
> their profiler... profiles... locally; not so big a burden.
>
> On Sun, Jun 22, 2025, at 3:40 PM, David Capwell wrote:
>
>  it sounds like you’re saying users who actually use the profiler today
> are SOL and need to roll their own solution.
>
>
> No, I am saying it’s good to have sidecar expose this and expose common
> patterns that people actually use.
>
> If sidecar wishes to expose exec and take the fact that this API could
> break on it, I am +0 to that.  I mostly am trying to highlight the risk
>
> On Jun 20, 2025, at 2:54 PM, Jon Haddad <j...@rustyrazorblade.com> wrote:
>
> Well, the discussion is about sidecar doing it. it sounds like you’re
> saying users who actually use the profiler today are SOL and need to roll
> their own solution.
>
>
> On Fri, Jun 20, 2025 at 10:24 AM David Capwell <dcapw...@apple.com> wrote:
>
> However, for folks like me that know the command line options and
> regularly do things that you might not have planned out, I'd appreciate an
> escape hatch where I can pass my raw commands
>
>
> For more “advanced” users, normal profile.sh would still be able to
> profile, just requires more steps.
>
> I think supporting both an abstraction-layer bound "simple mode" and a
> "--raw for experts" is the way to go.
>
>
> How do we say “this API has 0 compatibility for C* and can break w/e”?
>
> On Jun 20, 2025, at 5:22 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>
> I think supporting both an abstraction-layer bound "simple mode" and a
> "--raw for experts" is the way to go.
>
> On Thu, Jun 19, 2025, at 1:23 PM, Jon Haddad wrote:
>
> I understand the motivation to decouple the command line configuration
> from what we expose to end users, and to an extent, agree with the
> reasoning.  However, for folks like me that know the command line options
> and regularly do things that you might not have planned out, I'd appreciate
> an escape hatch where I can pass my raw commands.  Whatever you end up
> implementing, there's almost certainly commands that experienced
> async-profiler folks will want to use that weren't planned for.
>
> I am also not particularly interested in learning another syntax only to
> have it transformed into the thing I want to use.  I expect that would be a
> fairly simple flag (nodetool profile --raw xyz) that would skip the parse
> logic, so hopefully it's not too much trouble to add.  Reverse engineering
> the async profiler syntax into the thing we decide to use is, at least for
> me, will be a source of frustration.
>
> Thanks,
> Jon
>
>
>
> On Wed, Jun 18, 2025 at 4:01 PM Abe Ratnofsky <a...@aber.io> wrote:
>
> Another vote in favor of including async-profiler as a library in C*. The
> new heatmap format in async-profiler 4.0[1] is excellent and makes
> long-running profiles miles more useful than a plain flamegraph, but it
> requires a post-processing step after a JFR is collected, which would
> require a dependency on jfr-converter.jar[2]. Exposing the JFR files
> directly would be reasonable but slightly less useful, and the
> post-processed heatmap HTML files are much smaller and self-contained. A
> recent example on my machine shows HTML at 1/20th the size of the raw JFR
> dump, which is meaningful especially for uploading to Jira.
>
> Note that JDK25 will have experimental support for better CPU
> profiling[3], but async-profiler is still more mature and featureful,
> especially for other profiling types (wall, alloc).
>
> [1]:
> https://github.com/async-profiler/async-profiler/blob/master/docs/Heatmap.md
> [2]:
> https://github.com/async-profiler/async-profiler?tab=readme-ov-file#stable-release-40
> [3]:
> https://mostlynerdless.de/blog/2025/06/11/java-25s-new-cpu-time-profiler-1/
>
>
>
>
>

Re: [DISCUSS] CASSSIDECAR-254 - Enabling sidecar to collect async profiles

Reply via email to