Michael,
  I need a refresher, what are you trying to do again? Seems from
this information that you are just trying to get the bt848 data
to the Overlay without using up too much cpu? If that is the case
then you should be using the video4linux interface. Xawtv does this,
and I watch TV on my i815 without using up any cpu. There is no
memcpy in the path between the bt848 and the Overlay when using
v4l.
  I think this developed out of a video capture discussion which
requires that an application get the data out of the bt848 write it
to disk, then get the data into the overlay... that indeed uses a
memcpy since you are using Xv directly. If you want to, as has been
discussed on this list, get rid of that memcpy and replace it with
some DMA equivalent something like this has to happen.

Reserve a set of GTT (agpgart) pages for general magic mapping.
When XvShmPutImage gets into the X server you have to get each
  page from the Shm region and map it into those magic Gtt pages
  such that they appear linear. They can be left as cacheable.
Turn off Ring buffer arbitration
Flush the pipeline.
Blit from the magic region to the Overlay.
Flush again.
Turn arbitration back on.
Unmap the Shm region from the magic Gtt pages.
Flip the overlay

The arb-off,flush,blit,flush,arb-on are not immediate actions they
are commands which need to be placed in the ring buffer and they
will happen asynchronously. This causes an additional problem in
that you can't issue the overlay flip until the copy is finished
(no problem, use the overlay flip instruction instead of the
register). You still have one more issue to resolve. When you enter
the PutImage function you need to set up the overlay parameters,
and you can't touch those registers unless the last flip has
finished. I think the current bit checking will still work for that.

OK now that I wrote it down it doesn't look that bad. The hard part
is in the agpgart code. I don't even see a drm dependency. You
just need an agpgart function that takes a user address and size
and maps the pages into the gart and returns the gart address.
Does such an animal exist?

 -Matt



-----Original Message-----
From: Michael Zayats [mailto:[EMAIL PROTECTED]]
Sent: Saturday, October 13, 2001 2:17 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; Sottek, Matthew J
Subject: Re: [Xpert]XVideo (memcoy) consuiming to much CPU (i810)


well back to our cows...

I get frames from bt848 at 25 fps - F_CIF (710x576 YUV420 i.e. 12 bits) -0%
cpu usage if I just discard them.

since I get them in mmap'ed driver area and not shared memory, I use single
memcpy to copy them to one previously allocated shared memory - 25% CPU
time.

Now XvShmPutImage - 50% CPU - pretty predictable since it also does memcpy's
for the same buffer

no compression goes in a middle.

BTW it goes very well with observation that 250 loops of memcpy(...);
usleep(30000) take exactly 10 seconds meaning that memcpy takes 10
milliseconds.
multiplying by 25 = 250ms -> 25%

putting DMA might save about 25%...

another 2 questions:
1) may be I should just use some optimized version of memcpy? someone knows
of MMX or SSI uses in glibc? I have very defined hardware to run on...

2) offtopic: does somebody know how to access shared memory from kernel
space ( may be I will fix bttv driver to write directly to shared memory,
this will save me another 25%...)?

any help?


----- Original Message -----
From: Sottek, Matthew J <[EMAIL PROTECTED]>
To: 'Michael Zayats' <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, October 09, 2001 5:39 PM
Subject: RE: [Xpert]XVideo consuiming to much CPU


> Michael,
>   If you are only able to get 25fps then there is something wrong
> in your application. I know of Xv based mpeg decoders that can
> do full DVD sized frames at 30fps without issue, and the vast
> majority of the cpu is taken up with the mpeg decode not the
> transfer. I myself have done Xv tests that can peg the framerate
> at 99fps when the vertical retrace is 100 (this was with a smaller
> 320x200 mpeg1 stream) This was with a modest PIII cpu.
>   The bottom line is this, doing a blit from system memory to the
> framebuffer or some other DMA transfer could offload a little bit
> of cpu usage, but it isn't going to make anything "faster" the
> overlay can only flip buffers on vertical retrace and even a
> slow cpu should be able to keep up. Using the blit/DMA you will
> then either have to wait for the transfer to complete or have
> something else poll to find out when the transfer completes and
> then flips the overlay. That makes a mess of a pretty simple
> problem, all to save a little cpu.
>   Keep in mind that the memcpy isn't really that bad on i810
> since it is sharing memory bandwidth with the system instead
> of actually being behind a pci bus.
>
>  -Matt
>
> -----Original Message-----
> From: Michael Zayats [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, October 09, 2001 3:05 AM
> To: Mark Vojkovich
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Xpert]XVideo consuiming to much CPU
>
>
> >
> >    The i810 driver will not display video faster than the vertical
> > retrace.  If you send frames faster than that, it will busy wait
> > until the next retrace.  What you are seeing is the expected behavior
> > on i810.
>
> I send 25fps and as Peter already mentioned (and I checked it) it's
> because of memcpy use instead of DMA in XVideo i810 driver
>
> >
> >
> > Mark.
> >
> > _______________________________________________
> > Xpert mailing list
> > [EMAIL PROTECTED]
> > http://XFree86.Org/mailman/listinfo/xpert
> >
>
> _______________________________________________
> Xpert mailing list
> [EMAIL PROTECTED]
> http://XFree86.Org/mailman/listinfo/xpert
>

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to