On which platform are your running your benchmarks ? Which which
compiler did you compiled Neko ?
I'm testing on OS X, everything is compiled with GCC 4. I'm comparing
with Lua because you've been pretty dismissive of its performance on
many occasions. I also ran some of your neko programs in the bench
directory and most of the time neko is 3 times slower than Lua (except
for binary-trees where neko is almost as fast as java).
I don't remember being dismissive at Lua performances, although it's
true that on nekovm.org/faq it's listed together with PHP/Python in the
"pretty slow runtime" category. That might be a bit unfair and Lua might
have its own category ;)
Intrigued by this 3 times slower difference, I ran some tests on
Neko/Win32 CVS and Lua/Win32 binary (5.0.2). Both where built with MSVC
so we also compare with the same C compilers :
- fibonnacci (recursion with integer calculus) ran pretty much at the
same speed on both Neko an Lua.
- nbodies (floating point calculus) was indeed 3x faster on Lua. I might
have a look at further optimizing for such usage, although I think it's
pretty rare to do heavy floating point calculus in a VM (usualy one
would move such tasks on the C side).
- fannkuch is IMHO impossible to benchmark, with < 10ms running time
- binary-trees where 3.5 times faster in Neko than in Lua. This
benchmark mesure integer calculs, function call overhead, and allocation
of small objects. It's IMHO the most "generic" benchmark among these 4.
- as for the "sum-file" benchmark, I didn't try to run it, but I think
it's mainly measuring the C implementation of the readline() primitive.
If you use some C code similar that the one Lua is using, I think you
should get pretty much the same results.
Now, on OSX you might get additional performances since I haven't
optimized the registers for GCC. In neko/vm/interp.c you have the
following declaration :
#if defined(__GNUC__) && defined(__i386__)
# define ACC_BACKUP int_val __acc = acc;
# define ACC_RESTORE acc = __acc;
# define ACC_REG asm("%eax")
# define PC_REG asm("%esi")
# define SP_REG asm("%edi")
#else
... // no register optimizations
You might want to add a part for defining PPC registers. For example :
...
#elsif defined(__GNUC__) && defined(__ppc__)
# define ACC_BACKUP
# define ACC_RESTORE
# define ACC_REG asm("28")
# define PC_REG asm("26")
# define SP_REG asm("27")
#else
...
I'm not sure however if that will work correctly since I don't have the
hardware to test on :
- PC_REG and SP_REG should be registers that are preserved between calls
- ACC_REG can be either preserved or modified. In the second case
however you need to define the ACC_BACKUP and ACC_RESTORE like it's done
on X86 (because %eax is not preserved).
Nicolas
--
Neko : One VM to run them all
(http://nekovm.org)