+1 to including it, conceptually.  It's easily the best tool for diagnosing
perf issues that I've used. I've got a few questions / thoughts about
implementation details & user ergonomics.

- Capturing call stacks in modern kernels require some params to be set.
Are we going to be able to check the requirements are met and give the user
feedback?
- Profiling in containers is a little weird [1].  Same type of issue as my
first point.
- Getting allocation profiles requires debug symbols.  More ergonomics.
- Can I still attach using the asprof tool?  Will there be an issue if I
attach a newer version of the profiler?
- The profiler moves a lot faster than we do.  Are we going to bump the
async profiler in bug fix C* releases or are we freezing the version?
- Are we relocating the jars, or does Corretto?

Thanks!
Jon

[1]
https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md

On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> wrote:

> If we expose whatever API the 3rd party has and they drift or break it in
> the future, we could introduce a shim that would keep prior ergonomics at
> that time w/sane defaults or graceful handling of removals.
>
> Think "manager" is referring to the sidecar here.
>
> On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote:
>
> Can you help me to understand what you mean by that? I have a feeling
> I am missing something here or we are not on the same page.
>
> When it comes to API, we are not touching anything already there. We
> expose this through brand new
> org.apache.cassandra.profiler.AsyncProfilerMBean.
>
> So we are not really breaking anything here?
>
> I am also not completely sure what you meant by "manager", what
> manager? Is that some terminology from  your work or something we have
> here? Genuinely asking what you mean by that, I am lost a bit here.
>
> If you mean that "we start to call AsyncProfiler and then in later
> versions these guys decide that they will change how it is called" I
> do not think that is really an issue here, is it? A user does not deal
> with that directly anyway at all, only via MBean and there will
> presumably always be a way to start and stop profiling, that is
> basically at the very core of what that library is doing, no?
>
> On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> wrote:
> >
> >  If disabled, which is default,
> >
> >
> > I def won’t block on this, I just want us to think about these possible
> problems before we touch a public API; ill leave it to
> author(s)/reviewer(s).
> >
> > One thing that has been brought up in a different context is if we can
> make breaking changes to public facing APIs if the thing is disabled by
> default (debug tables is the example); I personally don’t have clarity here
> for the project so hard to say.
> >
> > TL;DR I am +0
> >
> > On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič <[email protected]>
> wrote:
> >
> > Oh wow! Thanks Dmitry for all these references. I think that the fact
> > Corretto includes that into JDK is the testament of the quality.
> >
> > David, I hope this answers your concerns pretty much?
> >
> > On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov <[email protected]>
> wrote:
> >
> >
> > + 1 from my side
> >
> > 1) it is well known mature profiling tool, there are other cases when
> Apache projects embedded it, for example:
> > - https://issues.apache.org/jira/browse/HADOOP-18055
> >  - https://issues.apache.org/jira/browse/HBASE-29045
> >  - https://issues.apache.org/jira/browse/FLINK-33325
> > 2) Apache-2.0 license
> > 3) the dependency has a small size (less than 1Mb) and does not have
> transitive dependencies to other 3rd parties
> > 4) the main contributors are now in Amazon, it is even included into
> Corretto JDK now (
> https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/
> )
> > 5) the logic is disabled by default, so no impact if you do not use it.
> >
> >
> > On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič <[email protected]>
> wrote:
> >
> >
> > This capability is disabled by default, it is driven by a system
> > property you have to set to true in order to be able to get an
> > instance of AsyncProfiler which does the actual profiling. If
> > disabled, which is default, then any calls via nodetool which needs
> > AsyncProfiler (start, stop, status) would return a message that
> > profiling is not enabled.
> >
> > Not sure if this answers your concerns but without knowingly turning
> > it on nothing happens.
> >
> > On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]>
> wrote:
> >
> >
> > I have no issues adding it.  I think my only real comment would be the
> same as with manager; w/e we expose to the public api (in this case
> Nodetool) we have to support, so if a 3rd party lib breaks compatibility
> that puts us in a bind if we didn’t think about that up front.
> >
> > Having async-profiler exposed makes it easier to profile is a good
> thing.  Manager has (or is in the process of adding) API auth so we can
> lock down async-profiler to those “allowed” but do we have similar in
> Nodetool?  We had an issue in the past that async-profiler would trigger a
> JVM crash (JVM bug), so we had to limit calls to it until it was fixed.
> >
> > On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič <[email protected]>
> wrote:
> >
> > Worth to mention that we were also contemplating about the inclusion
> > of jfr-convert so a user can also convert raw JFR files to e.g. HTML
> > with heatmaps but we evaluated that it is not necessary. Sure, it
> > would be comfortable, but ultimately not needed. Conversion of such a
> > file via nodetool, on server side, is just not a good idea, it is not
> > a job of a server to convert anything.
> >
> > In majority of cases, people using the profiler just want to get a
> > HTML with cpu / allocation profile, it can even gather JFR files as
> > such and fetch it is, it is just that the conversion as such can
> > happen on client's side instead.
> >
> > I am +1 for introducing the core async profiler library only.
> >
> > On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella
> > <[email protected]> wrote:
> >
> >
> > Hi everyone!
> >
> > I’d like to propose adding the async-profiler library to the Cassandra
> project. This will enable us to add a new nodetool command to do profiling
> tasks on the process running Cassandra. This information can be useful to
> debug a wide range of potential issues and performance optimizations.
> CASSANDRA-20854 captures the effort and the details of the proposal, and
> this PR proposes its implementation.
> >
> > I want to note that this feature was already discussed in this thread,
> and this one only want to make sure that no one has any concerns about
> adding the library as a dependency.
> >
> > What is async-profiler?
> > async-profiler is a low overhead sampling profiler for Java that does
> not suffer from the Safepoint bias problem. It features HotSpot-specific
> API to collect stack traces and to track memory allocations. The profiler
> works with OpenJDK and other Java runtimes based on the HotSpot JVM.
> >
> > Unlike traditional Java profilers, async-profiler monitors non-Java
> threads (e.g., GC and JIT compiler threads) and shows native and kernel
> frames in stack traces.
> >
> > What can be profiled:
> >
> > CPU time
> > Allocations in Java Heap
> > Native memory allocations and leaks
> > Contended locks
> > Hardware and software performance counters like cache misses, page
> faults, context switches
> > and more.
> >
> >
> > We propose to add async-profiler 4.2 as a dependency to Cassandra.
> >
> > Any concerns?
> > Bernardo
> >
> >
> >
> >
> >
> > --
> > Dmitry Konstantinov
> >
> >
>
>
>

Reply via email to