On 10 March 2008, Tim Chick wrote:
[...]
> > >  I put in some fairly OK rotation support into the mplayer pxa driver,
> > >  but this is software 0 there is no hardware support.
> >
> > How do you know that it is OK? What fraction of memcpy performance do
> > you get in your benchmarks? My YV12->YUY2 scaler is generally within
> > 70% of memcpy performance on Nokia 770 for example. So I can say it is
> > pretty fast ;-)
> >
> > Theoretically the scaler can be modified to also support rotation in
> > process, but I need to think about it a bit more.
> >
> > Anyway, it's not like I'm going to *sell* anything to you, take it or
> > leave it :-)
>
> I don't think the zaurus hardware supports YUY2 - only YV12, but I would
> have to check.

YV12 is easier to support than YUY2 and a lot easier than omapfb YUV420
format with quite a weird layout (each pair of bytes swapped). The scaler
in maemo build of mplayer handles these cases just fine. Supporting 
YV12->YV12 is a trivial addition :-) Rotation should be also not very
difficult.

Well, thank you for the information. Now I at least know that zaurus supports
planar YV12 format, and it needs rotation. Is all of this right?

Planar YV12 is a lot better than YUY2 as it is native format for video codecs
and it uses only 12-bit per pixel, while YUY2 uses 16-bit per pixel. More
tightly packed data means better performance.

> > > I *ONLY* bothered with this in 240x320 mode, as the zaurus DOES NOT
> > > have the cpu to play 480x640 movies at a good frame rate - so there is
> > > no point.
> >
> > What about playing lower resolution videos such as 640x360 for
> > example? Having arbitrary resolution support may be not a bad idea as
> > long as the resolution of video remains reasonably small to have
> > enough resources for decoding and displaying it.
>
> Nope, not enough for 640x360 - again I think this has a lot to do with the
> extra bandwidth on the SDRAM bus of using a higher res display.

Well, I guess support for 512x288, 352×288 (CIF), 176×144 (QCIF) or any other
reasonably low resolution would be also nice to have. Also you can always
downscale video to lower resolution. For example downscale 512x384 to 320x240.

Actually video is currently downscaled to fit 400x240 box on Nokia 770 by
default to improve performance because pixel doubling can be used to display
it fullscreen (higher resolution videos are already quite heavy on cpu, so 
every little bit helps).

> > >  Now for the 640x480 video. There are 3 problems:
> > >  1 - the video overlay uses uncached memory.
> >
> > What is the problem with uncached memory? Do you have any serious need
> > to read from framebuffer? Actually, if it gets cached, that will only
> > introduce cache pollution with unneeded data when writing data to it
> > (that is if xscale uses write-allocate cache, ARM9 and ARM11 do not
> > seem to have this feature). On x86, there is a special instruction for
> > writing to memory bypassing cache to solve cache pollution problem and
> > improve performance for large transfers.
>
> I *want* to put the data in the write cache, other wise the CPU spends much
> time stalled while the video controller DMAs out memory. I'm also not sure
> if the write buffer is enabled on this memory, hitting performance even
> more.

This all needs to be benchmarked. Enabling write-allocate cache for the output
buffer is bad for the performance. On the first data write miss, it will
actually *read* the whole cache line to cache, and then merge it with your
newly writtten data. This cache line will have to be flushed to back to memory
eventually. Actually load on the memory bus will only get higher. In order to
avoid all these above mentioned problems, MOVNTQ instruction was introduced on
x86 and it really does help.

What about just doing bulk writes using STM or STRD instructions?

> The cache will help more for the rotation code, but I think it will help
> for the normal copy, just to prevent  the stalling due to DMA.
>
> > So in my opinion uncached memory should have no negative effect on the
> > performance. Have you compared the performance of your rotation code
> > for output buffer in normal memory vs. framebuffer?
>
> No, I keep meaning to get round to doing this - it could be I am completely
> wrong, but the benchmarks I do in mplayer always point me in this
> direction, and the improvement in the old cacko rom always came from
> increasing the bus speed, not the cpu speed.

Yes, mplayer is really heavily memory performance dependent.

> > >  The frame buffer driver needs to be updated to allow the use of cached
> > > memory with double buffering and a mechanism to flush the cache.
> >
> > Don't understand the double buffering part, could you please explain
> > it with a bit more details?
>
> Well, you need double buffering for smooth tear free playback anyway.
>
> My idea is to have 2 cached  areas of memory - one is displaying current
> frame, the other is having next frame written to it. mplayer then completes
> the frame, which will cause a cache flush on that region, and the two
> buffers to be swapped by the video driver.

Is it possible to swap buffers by bvdd video driver without moving data
around? What kind of tearsync does it support?

_______________________________________________
Angstrom-distro-users mailing list
[email protected]
http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/angstrom-distro-users

Reply via email to