Re: Debugging slowness in Gets

Nick Dimiduk Wed, 06 Dec 2023 03:25:30 -0800

Hi Lars,

I took a look through my k8s PR's against hbase-operator-tools and see no
mention at all of async-profiler;  I have nothing to point you towards. I
also no longer have access to those systems, so I'm afraid I'm not much
further help.


I do remember there was some coordination required to enable capture of
kernel call stacks [0], but I don't recall the details of deploying this in
the kubernetes environment. Further down the doc [1], there is some mention
about running within a container, fiddling with the security profile.

[0]: https://github.com/async-profiler/async-profiler#basic-usage
[1]:
https://github.com/async-profiler/async-profiler#profiling-java-in-a-container

On Wed, Dec 6, 2023 at 10:03 AM Lars Francke <lars.fran...@gmail.com> wrote:

> Thanks Nick,
> I'll take a look at that.
>
> I did add async-profiler to our image[1] but haven't had a chance to
> test it yet.
> Do you remember if you had to run the container with extra privileges?
>
> I just opened this issue before I saw your mail as it turns out that
> not all args are exposed after all :)
> https://issues.apache.org/jira/browse/HBASE-28242 (ideas welcome in this
> ticket)
>
> I'll continue this journey and will report back with any issues and
> will see if I can improve anything
>
> [1] <
> https://github.com/stackabletech/docker-images/commit/ba957b37c8d4c679b918f3815e28d974b0bd008d
> >
>
> On Wed, Dec 6, 2023 at 9:06 AM Nick Dimiduk <ndimi...@apache.org> wrote:
> >
> > For what it’s worth, we deployed async-profiler into the regionserver
> > container image and it all worked as expected. But it’s not a sidecar
> > container, it’s on the same image as the region server.
> >
> > If you can get the async profiler into your container image, installed
> > where the RS can find it (as described in the online book; double-check
> you
> > have a version of AP that’s compatible with your version of the profiler
> > servlet), you should be able to use the profiling http endpoint on the
> RS.
> > It’ll run async-profiler with the arguments you specify (read the servlet
> > code, all args are exposed). You can then download the flamegraph via
> HTTP
> > as well …
> >
> > Well, most of the time. I have run into issues where the file wasn’t
> served
> > correctly and I had to download it from the region server file system
> > (annoying to do from a container). There’s probably a closed Jira where I
> > scratch my head in public.
> >
> > On Wed, 6 Dec 2023 at 08:15, Lars Francke <lars.fran...@gmail.com>
> wrote:
> >
> > > > > Also, are you sure you couldn't use async-profiler? We use this
> all the
> > > > > time in our very latency-sensitive production. It has no noticeable
> > > > > overhead in my experience and doesn't need any special
> dependencies.
> > > >
> > > > I have to admit, I have never used async-profiler. Shame on me.
> > > > That is a fabulous hint and I'll read up on it immediately.
> > >
> > > I now did read up on it, tried it locally, stumbled over
> > > https://issues.apache.org/jira/browse/HBASE-25685 and the fact that
> > > 2.4 fails weirdly using Java 21 only to find out (I should have read
> > > the whole docs earlier) that it's hard to run async-profiler in a
> > > container.
> > > For us, this is all running on Kubernetes, so we'll test that today.
> > >
> > > Testing i tlocally it looked very promising.
> > >
> > >
> > >
> > >
> > >
> > > > >
> > > > > On Tue, Dec 5, 2023 at 3:46 PM Lars Francke <
> lars.fran...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am debugging an issue where we see some Get requests taking
> 2-5s.
> > > > > > We do see "responseTooSlow" etc. and this is in an environment
> where
> > > I
> > > > > > cannot run a Profiler but I  _can_ run modified code.
> > > > > >
> > > > > > So what I did was I added a stupid "MethodTimer"[1] which
> records how
> > > > > > long certain operations take at various points in the code (e.g.
> > > [2]).
> > > > > > I've been doing this a few rounds and have now arrived at the
> > > StoreScanner.
> > > > > >
> > > > > > I'm wondering if anyone has better ideas on how to diagnose this?
> > > > > > I am a HBase committer but I haven't been able to keep up with
> the
> > > > > > changes in the last 5-6 years so I'm not too familiar with the
> inner
> > > > > > workings anymore and would appreciate a hint.
> > > > > >
> > > > > > I suspect it is slowness related to storage access.
> > > > > > I was not able to find any logs or tweaks to log "slow storage"
> > > > > > access, does such a thing exist?
> > > > > > And something else that'd help me: Can anyone point me (if it
> exists)
> > > > > > at the (vicinity of the) code that actually reads from HDFS at
> the
> > > > > > end? There are so many layers.
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > > Cheers,
> > > > > > Lars
> > > > > >
> > > > > >
> > > > > > [1] <
> > > > > >
> > >
> https://github.com/stackabletech/docker-images/blob/8349f29f8aded8a01a8d1dbf7a90776ede1764ca/hbase/stackable/patches/2.4.12/005-STACKABLE-profiling-2.4.12.patch#L150C5-L150C5
> > > > > > >
> > > > > > [2] <
> > > > > >
> > >
> https://github.com/stackabletech/docker-images/blob/8349f29f8aded8a01a8d1dbf7a90776ede1764ca/hbase/stackable/patches/2.4.12/005-STACKABLE-profiling-2.4.12.patch#L289-L297
> > > > > > >
> > > > > >
> > >
>

Re: Debugging slowness in Gets

Reply via email to