On which platform are your running your benchmarks ? Which which compiler did you compiled Neko ?

I'm testing on OS X, everything is compiled with GCC 4. I'm comparing with Lua because you've been pretty dismissive of its performance on many occasions. I also ran some of your neko programs in the bench directory and most of the time neko is 3 times slower than Lua (except for binary-trees where neko is almost as fast as java).

I don't remember being dismissive at Lua performances, although it's true that on nekovm.org/faq it's listed together with PHP/Python in the "pretty slow runtime" category. That might be a bit unfair and Lua might have its own category ;)

Intrigued by this 3 times slower difference, I ran some tests on Neko/Win32 CVS and Lua/Win32 binary (5.0.2). Both where built with MSVC so we also compare with the same C compilers :

- fibonnacci (recursion with integer calculus) ran pretty much at the same speed on both Neko an Lua.

- nbodies (floating point calculus) was indeed 3x faster on Lua. I might have a look at further optimizing for such usage, although I think it's pretty rare to do heavy floating point calculus in a VM (usualy one would move such tasks on the C side).

- fannkuch is IMHO impossible to benchmark, with < 10ms running time

- binary-trees where 3.5 times faster in Neko than in Lua. This benchmark mesure integer calculs, function call overhead, and allocation of small objects. It's IMHO the most "generic" benchmark among these 4.

- as for the "sum-file" benchmark, I didn't try to run it, but I think it's mainly measuring the C implementation of the readline() primitive. If you use some C code similar that the one Lua is using, I think you should get pretty much the same results.

Now, on OSX you might get additional performances since I haven't optimized the registers for GCC. In neko/vm/interp.c you have the following declaration :

#if defined(__GNUC__) && defined(__i386__)
#       define ACC_BACKUP       int_val __acc = acc;
#       define ACC_RESTORE      acc = __acc;
#       define ACC_REG asm("%eax")
#       define PC_REG asm("%esi")
#       define SP_REG asm("%edi")
#else
... // no register optimizations


You might want to add a part for defining PPC registers. For example :

...
#elsif defined(__GNUC__) && defined(__ppc__)
#       define ACC_BACKUP
#       define ACC_RESTORE
#       define ACC_REG asm("28")
#       define PC_REG asm("26")
#       define SP_REG asm("27")
#else
...

I'm not sure however if that will work correctly since I don't have the hardware to test on :

- PC_REG and SP_REG should be registers that are preserved between calls
- ACC_REG can be either preserved or modified. In the second case however you need to define the ACC_BACKUP and ACC_RESTORE like it's done on X86 (because %eax is not preserved).

Nicolas


--
Neko : One VM to run them all
(http://nekovm.org)

Reply via email to