On Tue, May 01, 2007 at 11:51:50AM +0300, ext Siarhei Siamashka wrote:
> On Monday 30 April 2007 17:49, Daniel Stone wrote:
> > Indeed.  Unfortunately this is slightly misleading in that it only shows
> > the raw write speed.  RFBI can't deal with the sorts of speeds that your
> > hyper-optimised version is pumping out, e.g.  So it's mainly just about
> > cutting the latency into the critical path to low enough that it makes
> > no difference.
> 
> The 'framebuffer' is just the ordinary system memory, converting color format 
> and copying data to framebuffer will be done with the same performance as 
> simulated in this test. RFBI performance is only critical for asynchronous
> DMA data transfer to LCD controller which does not introduce any overhead 
> and is performed at the same time as ARM core is doing some other work
> (decoding the next frame). RFBI performance matters only if data transfer to
> LCD is still not complete at the time when the next frame is already decoded
> and is ready to be displayed. When playing video, ARM core and LCD controller
> are almost always working at the same time performing different tasks in
> parallel. I think I had already explained these details in [1]

Right.  My point is that the numbers you're showing -- while very good,
don't get me wrong -- won't necessarily have a huge direct impact on
video playback.  Particularly if you want to avoid tearing.

> So now the results of the tests are consistent - when doing video output, most
> of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' function.

Well, either that, or just waiting for RFBI transfers to complete.

> Optimizing it using 'yv12_to_yuv420_line_armv6' will definitely provide a huge
> effect, video output overhead when using Xv will be at least halved providing
> more cpu resources for video decoding.

Yes, this is one good aspect.

> > I don't have any tips, per se.  Once I get it all integrated it'll be in
> > git, but for now, the only public source is the packages.
> 
> OK, thanks. It may take some time though. I'm still using old scratchbox
> with mistral SDK here (did not have enough free time to upgrade yet). Until I
> clean up my scratchbox mess, I can only provide some patch without testing, if
> anybody courageous can try to build it :)

I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ...

> Well, anyway, everything worked perfectly and I could play 640x480 video 
> on N800 with the following statistics:
> 
> VIDEO:  [DIVX]  640x480  12bpp  23.976 fps  886.7 kbps (108.2 kbyte/s)
> ...
> BENCHMARKs: VC:  87,757s VO:   8,712s A:   1,314s Sys:   3,835s =  101,618s
> BENCHMARK%: VC: 86,3592% VO:  8,5736% A:  1,2932% Sys:  3,7740% = 100,0000%
> BENCHMARKn: disp: 2044 (20,11 fps)  drop: 355 (14%)  total: 2399 (23,61 fps)
> 
> As you see, mplayer took 8.712 seconds to display 2044 VGA resolution frames. 
> If we do the necessary calculations, that's 72 millions pixels per second,
> quite close to 'yv12_to_yuv420_line_armv6' capabilities limit, so this
> function is the only major contributor to video output time. Video output
> took much less time than decoding, so it proves that video output 
> overhead can be reduced to minimum (in this test tearsync was not used
> though).

I'd be curious to see the results from this with tearsync _enabled_?
i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX
ioctl before you start writing to memory again.  This is basically the
limiter for us at this stage.

> When tearsync comes into action, everything gets a bit more complicated. I'm
> still investigating its impact on video playback performance.

'Not good'. :)

Thanks again for your work.

Cheers,
Daniel

Attachment: signature.asc
Description: Digital signature

_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to