Having recently acquired a suitably powerful graphics card, I thought it would be fun to have MAME do the scale2x resize effect through a fragment shader. It started off as an exercise to learn how to write GPU programs, but I quickly realised implementing the scale2x algorithm off the CPU could be a desirable feature in an emulator such as MAME:
- Allows the CPU to spend most of its time actually emulating hardware - Takes advantage of advanced graphics cards functions which for the most part are unused when scaling algorithms are performed on CPU. Fragment shaders could also be used to implement other effects such as rgb effects (scanline, pixel triad, etc) and possibly HQ/LQ resize algorithms. RGB effects should be trivial to code very efficiently. There are potential stumbling blocks with implementing the HQ algorithm, because of the constraints of fragment programs and my inexperience with writing them. By applying multiple effects, or the same effect many times, the full set of effects available in xmame could be implemented on the GPU with only a handful of fragment programs. For example, scale4x with a scanline effect could be implemented by applying the scale2x FP twice, and then the scanline FP. My proof-of-concept implementation of the scale2x algorithm works on the xmame-0.88 source on top of the existing opengl code. Modification to the actual source code was minimal; all that was required was the fragment program be loaded with the rest of opengl initialisation, and a single call to glProgramLocalParameter4fARB made at render time, to pass parameters to the program. I have uploaded the fragment program here: http://users.ox.ac.uk/~newc2303/scale2x.fp To work correctly, regular opengl bilinear filtering must be disabled. Functionally, the effect exactly mimics the CPU implementation of the algorithm. As far as speed is concerned, I get 90fps compared with 60fps, on a run through half a level of mslug, on my 1.9GHz P4/Geforce 6800 GT (I would be interested to know if there are any more precise benchmarking methods that people use). To allow multiple shaders to be applied at the same time, the opengl driver could be modified to render one pass per shader. The first pass would be rendered as normal. The remaining passes would then take the previous pass as a texture, and render it applied to a quad filling the screen. The scale2x algorithm on its own may not be particularly useful as a GPU implementation; people with hardware capable of running fragment programs most likely have CPUs capable of running a MAME emulation and a scale2x resize. However, I suspect there will be a lot of people whose computers fall into the category of not being able to run scale4x or HQ algorithms on the CPU in real time, but have a graphics card suitable to allow it to be run on the GPU (I myself fall into this category). I would be interested in hearing what other people think about this: Developers and end users. Would it be worth my time continuing and attempting to implement the HQ algorithm and other effects? Should I set about implementing my changes to the opengl driver so that it neatly merges with the rest of the code? Implementing this algorithm such that the fragment programs are used when in GL mode, and the current effects code when in any other mode, seems sensible, but might be difficult for me to do as I have little experience with the xmame source. It could also be the case that my propositions would require xmame to be restructured a little; is it the case that currently opengl mode itself is implemented as a filter? This approach would probably be incompatible with my changes. Looking forward to hearing what people think, Matthew Earl _______________________________________________ Xmame mailing list [EMAIL PROTECTED] http://toybox.twisted.org.uk/mailman/listinfo/xmame