Having recently acquired a suitably powerful graphics card, I thought
it would be fun to have MAME do the scale2x resize effect through a
fragment shader. It started off as an exercise to learn how to write
GPU programs, but I quickly realised implementing the scale2x
algorithm off the CPU could be a desirable feature in an emulator such
as MAME:

- Allows the CPU to spend most of its time actually emulating hardware
- Takes advantage of advanced graphics cards functions which for the
most part are unused when scaling algorithms are performed on CPU.

Fragment shaders could also be used to implement other effects such as
rgb effects (scanline, pixel triad, etc) and possibly HQ/LQ resize
algorithms. RGB effects should be trivial to code very efficiently.
There are potential stumbling blocks with implementing the HQ
algorithm, because of the constraints of fragment programs and my
inexperience with writing them.

By applying multiple effects, or the same effect many times, the full
set of effects available in xmame could be implemented on the GPU with
only a handful of fragment programs. For example, scale4x with a
scanline effect could be implemented by applying the scale2x FP twice,
and then the scanline FP.

My proof-of-concept implementation of the scale2x algorithm works on
the xmame-0.88 source on top of the existing opengl code. Modification
to the actual source code was minimal; all that was required was the
fragment program be loaded with the rest of opengl initialisation, and
a single call to glProgramLocalParameter4fARB made at render time, to
pass parameters to the program. I have uploaded the fragment program
here:

http://users.ox.ac.uk/~newc2303/scale2x.fp

To work correctly, regular opengl bilinear filtering must be disabled.

Functionally, the effect exactly mimics the CPU implementation of the
algorithm. As far as speed is concerned, I get 90fps compared with
60fps, on a run through half a level of mslug, on my 1.9GHz P4/Geforce
6800 GT (I would be interested to know if there are any more precise
benchmarking methods that people use).

To allow multiple shaders to be applied at the same time, the opengl
driver could be modified to render one pass per shader. The first pass
would be rendered as normal. The remaining passes would then take the
previous pass as a texture, and render it applied to a quad filling
the screen.

The scale2x algorithm on its own may not be particularly useful as a
GPU implementation; people with hardware capable of running fragment
programs most likely have CPUs capable of running a MAME emulation and
a scale2x resize. However, I suspect there will be a lot of people
whose computers fall into the category of not being able to run
scale4x or HQ algorithms on the CPU in real time, but have a graphics
card suitable to allow it to be run on the GPU (I myself fall into
this category).

I would be interested in hearing what other people think about this:
Developers and end users. Would it be worth my time continuing and
attempting to implement the HQ algorithm and other effects? Should I
set about implementing my changes to the opengl driver so that it
neatly merges with the rest of the code? Implementing this algorithm
such that the fragment programs are used when in GL mode, and the
current effects code when in any other mode, seems sensible, but might
be difficult for me to do as I have little experience with the xmame
source. It could also be the case that my propositions would require
xmame to be restructured a little; is it the case that currently
opengl mode itself is implemented as a filter? This approach would
probably be incompatible with my changes.

Looking forward to hearing what people think,

Matthew Earl

_______________________________________________
Xmame mailing list
[EMAIL PROTECTED]
http://toybox.twisted.org.uk/mailman/listinfo/xmame

Reply via email to