Marek, do you have an idea on where the currency bottleneck is?

I just did a profiling with sysprof, zooming in on the desktop in Weston
and moving the mouse wildly around, so that the buffer is completely
changed for every frame. I got around 5 fps, which isn't *that* much, but
still an order of magnitude better than without your patches.

sysprof says there is 100% CPU usage, but unlike the previous 0.5-FPS
recording, it's not in a single function, but spread out over several
functions:

35% weston_recorder_frame_notify
11% __memcpy_ssse3
4.5% clear_page_c
4.3% output_run

Although I'm not completely sure I'm reading the sysprof output right.
weston_recorder_frame_notify, for example, has 35% CPU usage, but none of
its child functions has any significant CPU usage. I presume the CPU usage
in that function is from calling glReadPixels, although that's not apparent
from sysprof:

weston_recorder_frame_notify                                     39.15%
 39.15%
  - - kernel - -                                                  0.00%
0.01%
    ret_from_intr                                                 0.00%
0.01%
      __irqentry_text_start                                       0.00%
0.01%
        irq_exit                                                  0.00%
0.01%
          do_softirq                                              0.00%
0.01%
            call_softirq                                          0.00%
0.01%
              __do_softirq                                        0.00%
0.01%
                blk_done_softirq                                  0.00%
0.01%
                  scsi_softirq_done                               0.00%
0.01%
                    scsi_finish_command                           0.00%
0.01%
                      scsi_io_completion                          0.00%
0.01%
                        blk_end_request                           0.00%
0.01%
                          blk_end_bidi_request                    0.00%
0.01%
                            blk_update_bidi_request               0.00%
0.01%
                              blk_update_request                  0.00%
0.01%
                                req_bio_endio.isra.46             0.00%
0.01%
                                  bio_endio                       0.00%
0.01%
                                    end_swap_bio_write            0.00%
0.01%
                                      end_page_writeback          0.00%
0.01%
                                        rotate_reclaimable_page   0.01%
0.01%

Another possible bottleneck is simply disk access, although it doesn't seem
to be relevant on my system (since I have 100% CPU usage). The 36-second
recording I made was 1.3 GB in size, so that's around 36 MB/s.

Med venlig hilsen,

Rune Kjær Svendsen
Østerbrogade 111, 3. - 302
2100 København Ø
Tlf.: 2835 0726


On Mon, Mar 18, 2013 at 1:20 AM, Marek Olšák <mar...@gmail.com> wrote:

> Slowness is not usually a bug.
>
> I guess it can be optimized even more. It depends on where the
> bottleneck is now.
>
> Marek
>
> On Sun, Mar 17, 2013 at 10:14 PM, Rune Kjær Svendsen
> <runesv...@gmail.com> wrote:
> > Thank you very much! This is much better. It's gone from 0.5-ish FPS when
> > zooming in to around 10 FPS, depending on screen content.
> >
> > So I figure this isn't a bug? I assumed it was a bug, but is the case
> simply
> > that an efficient glReadPixels path for radeon/gallium doesn't exist?
> >
> > The patch set sure helps in that regard, although it'd be really nice to
> get
> > 30 FPS consistently, if at all possible.
> >
> > Thanks again.
> >
> > /Rune
> >
> >
> > On Sun, Mar 17, 2013 at 6:46 PM, Andreas Boll <
> andreas.boll....@gmail.com>
> > wrote:
> >>
> >> 2013/3/17 Rune Kjær Svendsen <runesv...@gmail.com>:
> >> > Hello list
> >> >
> >> > I'm having problems recording the desktop content using the Weston
> >> > compositor's built-in recording function. When I start a recording and
> >> > do
> >> > something that changes a lot of screen content (like zooming in on the
> >> > desktop, for example), I get around 0.5 FPS. Using sysprof, I can see
> >> > that
> >> > ~98% of CPU is used in the function unpack_XRGB8888(). krh has told me
> >> > this
> >> > is caused by glReadPixels going through a slowpath. I have a Radeon HD
> >> > 5770
> >> > GPU and I'm using mesa git (I've tried the mesa version in the Ubuntu
> >> > 12.10
> >> > repos, and the xorg-edgers PPA, same result).
> >> >
> >> > Does anyone know what the issue could be, or how to debug the problem
> >> > further?
> >> >
> >>
> >> This patch series [1] should help. You might want to try it.
> >>
> >> [1]
> http://lists.freedesktop.org/archives/mesa-dev/2013-March/036214.html
> >>
> >> > Doing some debugging, it seems the call to ctx->Driver.ReadPixels() in
> >> > _mesa_ReadnPixelsARB leads to _mesa_readpixels() being called in
> >> > readpix.c.
> >> >
> >> > I'm attaching some output of gdb that will hopefully be useful.
> >> >
> >> > I'm also attaching the debug terminal output of running Weston with
> the
> >> > DRM
> >> > backend.
> >> >
> >> > Let me know if I can provide other useful information.
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev@lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >> >
> >
> >
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to