>Previous experiences (good or bad)
I have been using an async-profiler in my project for quite some time to
profile the CPU. Additionally, I have wrapped it with an HTTP interface,
allowing one to open a browser and view the CPU flame graph in real-time,
which further simplifies the process.
It is integrated as a library, and my preference is to include it as a
library, rather than forking processes.

Jaydeep

On Sat, Jun 14, 2025 at 8:14 AM Josh McKenzie <jmcken...@apache.org> wrote:

> I have seen cases where specific async-profiler/JVM/Cassandra version
> combos (JDK11/4.1-derived source tree) will immediately crash the JVM on
> profile - especially successive profile invocations on the same process
>
> This would be a great candidate for testing to ensure that, at least for
> provided profiles, this doesn't happen.
>
> On Fri, Jun 13, 2025, at 10:41 PM, C. Scott Andreas wrote:
>
> Supportive of inclusion as well. General preference for invoking as a
> library rather than forking processes.
>
> Jon, thanks for the tips on off-CPU profiling - added to my personal cheat
> sheet.
>
> I have seen cases where specific async-profiler/JVM/Cassandra version
> combos (JDK11/4.1-derived source tree) will immediately crash the JVM on
> profile - especially successive profile invocations on the same process -
> but have not observed this on JDK21 or trunk-derived source trees. If we
> have user reports of that happening, we’ll need to figure out how to
> reproduce and get to the bottom of it.
>
> – Scott
>
> > On Jun 13, 2025, at 5:24 PM, Francisco Guerrero <fran...@apache.org>
> wrote:
> >
> > Thanks for bringing this discussion Doug. I didn't realize that
> async-profiler allows you to
> > bring it as a dependency. It looks pretty neat from what I could tell. I
> also think bringing
> > this to Cassandra as a dependency is a reasonable approach. We need to
> come up with
> > a solid way to expose this via JMX / vtable.
> >
> > Best,
> > - Francisco
> >
> >> On 2025/06/13 21:08:28 Doug Rohrer wrote:
> >> The nice thing from what I can tell about using the Java API per [6]
> below is that you can literally just get an instance of the profiler and
> pass it some commands in the `execute` method… just need to be careful how
> much of that surface area we expose. Jon (and others obviously) I’d love to
> get your take on how we could make a useful interface to the
> async-profiler, maybe exposed via JMX, that doesn’t require someone to read
> the entirety of the async-profiler docs and provides some useful profiles
> without the rough edges (things like managing temp files so users don’t
> have to know the layout of the filesystem C* is running on, for example,
> since at least in the Sidecar we’d be executing this on behalf of a remote
> user, with all of the constraints that implies).
> >>
> >> We can always be more protective in the Sidecar than we are server-side
> as well, but it seems like helping operators not do bad things is a good
> thing.
> >>
> >> Obviously we’d want the ability Cassandra-side to disable this
> functionality all together however we implement it.
> >>
> >> Doug
> >>
> >>>> On Jun 13, 2025, at 2:38 PM, Jon Haddad <j...@rustyrazorblade.com>
> wrote:
> >>>
> >>> I'd be very happy to see async-profiler included with C*  I've made
> extensive use of it in my performance evaluations [1][2], and even posted a
> video about it [3] for general Java perf analysis (among others).  It's
> part of easy-cass-lab and is easily the most informative tool I've found
> for the getting to the bottom of anything performance related.
> >>>
> >>> There's probably a good case to be made for including it with the C*
> artifact as well as having it be something you can drop in. I lean towards
> including it all the time, but I haven't run it this way myself yet, so
> there might be some downside I'm unaware of.
> >>>
> >>> When you call the asprof executable, it attaches the async-profiler to
> the running jvm using jattach [4].  We could do this as well, if we wanted
> to avoid including it with the release, but I don't know how much we really
> benefit from that.  I've run into issues with it when it's unable to
> detatch correctly, then you're unable to reattach it until after the server
> is restarted.  On the flip side, I don't know if you're able to set up all
> the same options for arbitrary profiling when it's loaded as an agent and
> turned on/off dynamically.  I think we can, based on the integration page
> [6], but I haven't tried it yet.  It would be a bummer if we only had a
> single mode of profiling available.
> >>>
> >>> The default mode, CPU profiling, is fantastic, but I've also made
> extensive use of allocation profiling [5] to identify perf issues as well
> so having that available is a must, imo. Wall clock / off cpu profiling is
> great for identifying when IO is the root cause, which isn't clearly
> revealed by on-cpu profiling due to the way threads are scheduled.  When I
> look at a system I typically do CPU / Wall / Alloc / Off-CPU to be
> thorough, and the last thing you want to do is have to restart between each
> one.  You can also specify specific Java methods, include or exclude frames
> matching specific regex, and a whole slew of other options.  The latest
> version even supports continuous profiling with heatmaps although I haven't
> tried it yet.
> >>>
> >>> So hopefully the option we go with allows all of that, otherwise the
> limits would impose more of a headache to me as I'd need to remove it and
> continue to bring my own.
> >>>
> >>> Under the hood, the async-profiler uses Linux perf events + <>
> asynchronous polling of the java stack to match them up and generate it's
> reports.  As a result, it requires certain permissions to run and get all
> the details I like.  Specifically these kernel parameters:
> >>>
> >>> sudo sysctl kernel.perf_event_paranoid=1
> >>> sudo sysctl kernel.kptr_restrict=0
> >>>
> >>> You also need to enable some capabilities for off-cpu profiliing:
> >>>
> >>> sudo find /usr/lib/jvm/ -type f -name 'java' -exec setcap
> "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" {} \;
> >>>
> >>> Then you can do off-cpu with this wild cryptic version (shout out to
> Andrei Pangin for helping me with this [7]):
> >>>
> >>> asprof -e kprobe:schedule -i 2 --cstack dwarf -X '*Unsafe.park*'
> "${@:2}" $PID
> >>>
> >>> There's also some subtle issues when it's run in a container, since by
> default you don't have access to the perf_event_open syscall.  Just
> something to keep in mind.  This is one of my main grievances with
> container deployments.
> >>>
> >>> Indeed Patrick, I am very happy to see this discussion!  Thanks Doug
> for starting the thread.
> >>>
> >>> Jon
> >>>
> >>> [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
> >>> [2] https://issues.apache.org/jira/browse/CASSANDRA-19477
> >>> [3]
> https://www.youtube.com/watch?v=yNZtnzjyJRI&t=212s&pp=ygUOYXN5bmMgcHJvZmlsZXI%3D
> >>> [4]
> https://github.com/async-profiler/async-profiler/blob/2b556680dc8f5d02c3f26ac119d835dc2381e604/src/jattach/jattach_hotspot.c#L38
> >>> [5] https://issues.apache.org/jira/browse/CASSANDRA-20428
> >>> [6]
> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md
> >>> [7] https://github.com/async-profiler/async-profiler/issues/907
> >>>
> >>>
> >>> On Fri, Jun 13, 2025 at 10:18 AM Patrick McFadin <pmcfa...@gmail.com
> <mailto:pmcfa...@gmail.com>> wrote:
> >>>> The fact o3 used "Bus-factor" as a dimension is just amazing.
> >>>>
> >>>> After reading more about the project, the possibilities are pretty
> interesting. I suspect we'll see this in a Haddad talk soon.
> >>>>
> >>>> On Fri, Jun 13, 2025 at 1:57 AM Josh McKenzie <jmcken...@apache.org
> <mailto:jmcken...@apache.org>> wrote:
> >>>>> I was curious if o3 (model from OpenAI) would be able to do a deep
> dive health check on a repo to assist in considering taking it as a
> dependency. The results can be found here:
> https://chatgpt.com/share/684be703-1d4c-8002-b831-f997f829f4b4
> >>>>>
> >>>>> Apparently it can, and can do it quite well. This was a useful time
> saver (and honestly did a better job than I usually can in > 10x the time)
> >>>>>
> >>>>> I'm +1 to taking this as a dependency on the lib in core C*. The
> rest of the ecosystem can consume it (more easily if we move to a
> cassandra-shared regime shared library build as well), and it opens up some
> interesting opportunities for us in both how we test core C* proper and
> what we expose in tooling.
> >>>>>
> >>>>> On Thu, Jun 12, 2025, at 7:36 PM, Paulo Motta wrote:
> >>>>>> I'd prefer to avoid calling an external process and use the library
> if possible. Not sure about including it in the project by default, but
> also not against.
> >>>>>>
> >>>>>> If there's contention about including it, I wonder if it would make
> sense to explore  java's optional module extension[1] to make this
> available optionally ? I can see this being useful for other extensions if
> we haven't explored that option.
> >>>>>>
> >>>>>> Then we could have another project cassandra-sidecar-extensions (or
> similar) that would be linked by sidecar/advanced operators to enable
> extended featureset in the main process.
> >>>>>>
> >>>>>>
> >>>>>> [1] -
> >>>>>> https://openjdk.org/projects/jigsaw/doc/topics/optional.html
> >>>>>>
> >>>>>> On Thu, 12 Jun 2025 at 17:57 Doug Rohrer <droh...@apple.com
> <mailto:droh...@apple.com>> wrote:
> >>>>>> Hey folks!
> >>>>>>
> >>>>>> We're looking into enabling the sidecar to collect async profiles
> from Cassandra and, digging through the async-profiler code and usage, it
> seems like there may be a few different ways to do it. I’m curious if other
> folks have already done this beyond just “run asprof with the pid of the
> Cassandra process”, as I’m a bit hesitant to depend on executing an
> external process from the Sidecar to gather the actual profile if we can
> avoid it.
> >>>>>>
> >>>>>> There seem to be some opportunities to integrate the profiler into
> another project (see
> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api)
> but it seems this would end up having to be part of Cassandra, and somehow
> callable via the sidecar (JMX? Some virtual table interface where you
> insert a row to start a profile with the profiler options, and it kicks off
> the profile, dumping the results into the table when it’s done?).
> >>>>>>
> >>>>>> The benefit in putting this functionality into Cassandra would be
> that other consumers (in-jvm dtests, python dtests, other monitoring
> systems where Sidecar isn’t available, easy-cass-lab) would be able to
> leverage the same interface rather than having to re-invent the wheel each
> time.
> >>>>>>
> >>>>>> Drawback is it’s another library, and one with native library
> dependencies, added to the class path and loaded at runtime.
> >>>>>>
> >>>>>> Thoughts? Previous experiences (good or bad)?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Doug
> >>>>>
> >>
> >>
>
>
>

Reply via email to