For what it’s worth, we deployed async-profiler into the regionserver
container image and it all worked as expected. But it’s not a sidecar
container, it’s on the same image as the region server.

If you can get the async profiler into your container image, installed
where the RS can find it (as described in the online book; double-check you
have a version of AP that’s compatible with your version of the profiler
servlet), you should be able to use the profiling http endpoint on the RS.
It’ll run async-profiler with the arguments you specify (read the servlet
code, all args are exposed). You can then download the flamegraph via HTTP
as well …

Well, most of the time. I have run into issues where the file wasn’t served
correctly and I had to download it from the region server file system
(annoying to do from a container). There’s probably a closed Jira where I
scratch my head in public.

On Wed, 6 Dec 2023 at 08:15, Lars Francke <lars.fran...@gmail.com> wrote:

> > > Also, are you sure you couldn't use async-profiler? We use this all the
> > > time in our very latency-sensitive production. It has no noticeable
> > > overhead in my experience and doesn't need any special dependencies.
> >
> > I have to admit, I have never used async-profiler. Shame on me.
> > That is a fabulous hint and I'll read up on it immediately.
>
> I now did read up on it, tried it locally, stumbled over
> https://issues.apache.org/jira/browse/HBASE-25685 and the fact that
> 2.4 fails weirdly using Java 21 only to find out (I should have read
> the whole docs earlier) that it's hard to run async-profiler in a
> container.
> For us, this is all running on Kubernetes, so we'll test that today.
>
> Testing i tlocally it looked very promising.
>
>
>
>
>
> > >
> > > On Tue, Dec 5, 2023 at 3:46 PM Lars Francke <lars.fran...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am debugging an issue where we see some Get requests taking 2-5s.
> > > > We do see "responseTooSlow" etc. and this is in an environment where
> I
> > > > cannot run a Profiler but I  _can_ run modified code.
> > > >
> > > > So what I did was I added a stupid "MethodTimer"[1] which records how
> > > > long certain operations take at various points in the code (e.g.
> [2]).
> > > > I've been doing this a few rounds and have now arrived at the
> StoreScanner.
> > > >
> > > > I'm wondering if anyone has better ideas on how to diagnose this?
> > > > I am a HBase committer but I haven't been able to keep up with the
> > > > changes in the last 5-6 years so I'm not too familiar with the inner
> > > > workings anymore and would appreciate a hint.
> > > >
> > > > I suspect it is slowness related to storage access.
> > > > I was not able to find any logs or tweaks to log "slow storage"
> > > > access, does such a thing exist?
> > > > And something else that'd help me: Can anyone point me (if it exists)
> > > > at the (vicinity of the) code that actually reads from HDFS at the
> > > > end? There are so many layers.
> > > >
> > > > Thank you!
> > > >
> > > > Cheers,
> > > > Lars
> > > >
> > > >
> > > > [1] <
> > > >
> https://github.com/stackabletech/docker-images/blob/8349f29f8aded8a01a8d1dbf7a90776ede1764ca/hbase/stackable/patches/2.4.12/005-STACKABLE-profiling-2.4.12.patch#L150C5-L150C5
> > > > >
> > > > [2] <
> > > >
> https://github.com/stackabletech/docker-images/blob/8349f29f8aded8a01a8d1dbf7a90776ede1764ca/hbase/stackable/patches/2.4.12/005-STACKABLE-profiling-2.4.12.patch#L289-L297
> > > > >
> > > >
>

Reply via email to