+1 to including it, conceptually. It's easily the best tool for diagnosing perf issues that I've used. I've got a few questions / thoughts about implementation details & user ergonomics.
- Capturing call stacks in modern kernels require some params to be set. Are we going to be able to check the requirements are met and give the user feedback? - Profiling in containers is a little weird [1]. Same type of issue as my first point. - Getting allocation profiles requires debug symbols. More ergonomics. - Can I still attach using the asprof tool? Will there be an issue if I attach a newer version of the profiler? - The profiler moves a lot faster than we do. Are we going to bump the async profiler in bug fix C* releases or are we freezing the version? - Are we relocating the jars, or does Corretto? Thanks! Jon [1] https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> wrote: > If we expose whatever API the 3rd party has and they drift or break it in > the future, we could introduce a shim that would keep prior ergonomics at > that time w/sane defaults or graceful handling of removals. > > Think "manager" is referring to the sidecar here. > > On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote: > > Can you help me to understand what you mean by that? I have a feeling > I am missing something here or we are not on the same page. > > When it comes to API, we are not touching anything already there. We > expose this through brand new > org.apache.cassandra.profiler.AsyncProfilerMBean. > > So we are not really breaking anything here? > > I am also not completely sure what you meant by "manager", what > manager? Is that some terminology from your work or something we have > here? Genuinely asking what you mean by that, I am lost a bit here. > > If you mean that "we start to call AsyncProfiler and then in later > versions these guys decide that they will change how it is called" I > do not think that is really an issue here, is it? A user does not deal > with that directly anyway at all, only via MBean and there will > presumably always be a way to start and stop profiling, that is > basically at the very core of what that library is doing, no? > > On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> wrote: > > > > If disabled, which is default, > > > > > > I def won’t block on this, I just want us to think about these possible > problems before we touch a public API; ill leave it to > author(s)/reviewer(s). > > > > One thing that has been brought up in a different context is if we can > make breaking changes to public facing APIs if the thing is disabled by > default (debug tables is the example); I personally don’t have clarity here > for the project so hard to say. > > > > TL;DR I am +0 > > > > On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič <[email protected]> > wrote: > > > > Oh wow! Thanks Dmitry for all these references. I think that the fact > > Corretto includes that into JDK is the testament of the quality. > > > > David, I hope this answers your concerns pretty much? > > > > On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov <[email protected]> > wrote: > > > > > > + 1 from my side > > > > 1) it is well known mature profiling tool, there are other cases when > Apache projects embedded it, for example: > > - https://issues.apache.org/jira/browse/HADOOP-18055 > > - https://issues.apache.org/jira/browse/HBASE-29045 > > - https://issues.apache.org/jira/browse/FLINK-33325 > > 2) Apache-2.0 license > > 3) the dependency has a small size (less than 1Mb) and does not have > transitive dependencies to other 3rd parties > > 4) the main contributors are now in Amazon, it is even included into > Corretto JDK now ( > https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/ > ) > > 5) the logic is disabled by default, so no impact if you do not use it. > > > > > > On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič <[email protected]> > wrote: > > > > > > This capability is disabled by default, it is driven by a system > > property you have to set to true in order to be able to get an > > instance of AsyncProfiler which does the actual profiling. If > > disabled, which is default, then any calls via nodetool which needs > > AsyncProfiler (start, stop, status) would return a message that > > profiling is not enabled. > > > > Not sure if this answers your concerns but without knowingly turning > > it on nothing happens. > > > > On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]> > wrote: > > > > > > I have no issues adding it. I think my only real comment would be the > same as with manager; w/e we expose to the public api (in this case > Nodetool) we have to support, so if a 3rd party lib breaks compatibility > that puts us in a bind if we didn’t think about that up front. > > > > Having async-profiler exposed makes it easier to profile is a good > thing. Manager has (or is in the process of adding) API auth so we can > lock down async-profiler to those “allowed” but do we have similar in > Nodetool? We had an issue in the past that async-profiler would trigger a > JVM crash (JVM bug), so we had to limit calls to it until it was fixed. > > > > On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič <[email protected]> > wrote: > > > > Worth to mention that we were also contemplating about the inclusion > > of jfr-convert so a user can also convert raw JFR files to e.g. HTML > > with heatmaps but we evaluated that it is not necessary. Sure, it > > would be comfortable, but ultimately not needed. Conversion of such a > > file via nodetool, on server side, is just not a good idea, it is not > > a job of a server to convert anything. > > > > In majority of cases, people using the profiler just want to get a > > HTML with cpu / allocation profile, it can even gather JFR files as > > such and fetch it is, it is just that the conversion as such can > > happen on client's side instead. > > > > I am +1 for introducing the core async profiler library only. > > > > On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella > > <[email protected]> wrote: > > > > > > Hi everyone! > > > > I’d like to propose adding the async-profiler library to the Cassandra > project. This will enable us to add a new nodetool command to do profiling > tasks on the process running Cassandra. This information can be useful to > debug a wide range of potential issues and performance optimizations. > CASSANDRA-20854 captures the effort and the details of the proposal, and > this PR proposes its implementation. > > > > I want to note that this feature was already discussed in this thread, > and this one only want to make sure that no one has any concerns about > adding the library as a dependency. > > > > What is async-profiler? > > async-profiler is a low overhead sampling profiler for Java that does > not suffer from the Safepoint bias problem. It features HotSpot-specific > API to collect stack traces and to track memory allocations. The profiler > works with OpenJDK and other Java runtimes based on the HotSpot JVM. > > > > Unlike traditional Java profilers, async-profiler monitors non-Java > threads (e.g., GC and JIT compiler threads) and shows native and kernel > frames in stack traces. > > > > What can be profiled: > > > > CPU time > > Allocations in Java Heap > > Native memory allocations and leaks > > Contended locks > > Hardware and software performance counters like cache misses, page > faults, context switches > > and more. > > > > > > We propose to add async-profiler 4.2 as a dependency to Cassandra. > > > > Any concerns? > > Bernardo > > > > > > > > > > > > -- > > Dmitry Konstantinov > > > > > > >
