<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > Hello again, > This time about sub-pixel aliasing. > > Andrew Tween writes: > > Hi Bryce, > > I think it is a good idea to release the solid 3.8 version. > > > > Having said that, I am looking forward to the 3.9 release because I really want > > to try using Exupery on my sub-pixel font filtering algorithm to see if it can > > speed it up. Currently this is in 3.9, and I don't want to port it all back to > > an earlier image/vm, especially since you are moving forward to 3.9. > > Exupery runs fine on 3.9, the tests just needed to be fixed. > > The best way to find out how it performs for your example would be to > load Exupery into your 3.9 image and try it.
The subpixel rendering needs a modified vm (for BitBlt stuff). And Exupery needs a modified vm. Currently these are built from different versions of vmmaker, svn sources,etc. So, I am keen for them to be synchronised, and I am sure it will all come together eventually. In the meantime, I guess I could create a standalone benchmark, which would be interesting in its own right. > > > This is probably a topic for another thread, but could you tell from looking at > > the attached method if it is a good candidate for speed-up. It has nested loops, > > does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift: , > > bitAnd: , *, + , // , and some Float calcs. > > I'm not sure how well it would run. The code is definately a promising > candidate to compile however Exupery doesn't yet compile Floats, large > integers, or primitive 166. I don't think the interpreter does any > special optimisations for them either so chances are those operations > will run at the same speed. Exupery will be able to optimise the > SmallInteger calculations and looping overhead. Is the primitive compilation something that I, or others, could help with? What is involved in adding a primitive to Exupery? > > The method could definately be optimised much more. Adding > integerAt:put: and ByteArray>>at: primitives would help. So would > basic floating point optimisations. Going further, adding support for > machine word (32 bit integer) and byte objects should allow us to > compile to near C speeds. > > The optimisations for machine words, bytes objects, and floating point > are all very similar. The game is to remove all the intermediate > objects so the calculations are done directly in registers without any > conversion and deconversion overhead. > > luminance := (0.299*balR)+(0.587*balG)+(0.114*balB). > balR := balR + ((luminance - balR)*correctionFactor). > balG := balG + ((luminance - balG)*correctionFactor). > balB := balB + ((luminance - balB)*correctionFactor). > balR := balR truncated. > balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]]. > balG := balG truncated. > balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]]. > balB := balB truncated. > balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]]. > a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0]. > colorVal := balB + (balG bitShift: 8) + (balR bitShift: 16) + (a bitShift: 24). > answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal. > > Is a nice example to show what dynamically inlined primitives could > do. The major overhead with floats is allocating memory (1). In this > example, using the current optimisation engine it should be possible > to create only 4 floats rather than 19 needed by the intepreter. One > more allocation will be needed to form colorVal if it overflows into a > LargeInteger. SSA should allow all the floating point intermediate > values to be removed by allow program analysis over more than one > statement. > > balR := balR truncated. > balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := > 255]]. > > Should probably be handled via a primitive that truncates a floating > point value down to an unsigned 8 bit value. For this example such a > primitive may be overkill however converting floating point values > to. But with Exupery 3.0 and SSA it would be really nice to be able to > optimise to vectors. With vector optimisation we will have a level > playing field with C, they will need at least as much compiler > machinery as we will and they will probably write their compilers in C > requiring much more work than writing in Smalltalk. > > In summary, I think there may be some speed improvement now. Adding > the array access primitives will help. Floating point is likely to be > the next biggest win. Without SSA I doubt that other optimisations > will provide enough gain to be worthwhile. With SSA and a few extra > object types it should be possible to fully optimise it. Thanks for your comments. I had intended to re-write the method in C and add it to the plugin, but the advantages of being able to easily play with it in Smalltalk outweigh the speed-up of porting to C, at least while I am still experimenting. Cheers, Andy > > Bryce > > (1) After upgrading the VM I'm going to implement fast compiled > primitives for #new and [EMAIL PROTECTED] This is driven by the largeExplorers > benchmark. #@ is inlined into the main interpret loop in the > interpreter but Exupery executes it as a normal primitive. This means > that compiling largeExplorers can lead to a 8% speed loss. _______________________________________________ Exupery mailing list [email protected] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
