>>>>> "Bernd" == Bernd Paysan <bernd.pay...@gmx.de> writes:
> Am Samstag, 22. März 2014, 07:24:55 schrieb David Kuehling: >> I'm using a recent gforth revision from git (6ec9915f6277de) and >> noticed that running gforth --dynamic produces pretty extreme >> performance degradation [..] > How does this affect other microbenchmarks, e.g. onebench.fs? And: > SEE-CODE <word> shows the dynamically generated code; could you > provide that for the microbenchmark above? Ahh, SEE-CODE does a nice job. The disassembly for the full code-sequences of my recursive micro-benchmark for gforth-fast with and w/o --dynamic is listed below. Looks like there is a problem with the CALL code sequence generated for calls into colon-definitions: gforth-fast --dynamic : test1 ; : test2 test1 ; see-code test2 $2BB725B0 call $2BB725B4 <test1> ( $2BFC9FA8 ) 3 16 0 addu, ( $2BFC9FAC ) 16 0 16 lw, ( $2BFC9FB0 ) 2 18 0 addu, ( $2BFC9FB4 ) 3 3 4 addiu, ( $2BFC9FB8 ) 18 18 -4 addiu, ( $2BFC9FBC ) 16 16 4 addiu, ( $2BFC9FC0 ) 3 -4 2 sw, ( $2BFC9FC4 ) 2 -4 16 lw, ( $2BFC9FC8 ) $7C03E83B , ( illegal inst ) ( $2BFC9FCC ) 4 -32680 28 lw, ( $2BFC9FD0 ) 30 3 0 addu, ( $2BFC9FD4 ) 4 4 30 addu, ( $2BFC9FD8 ) 3 2 0 addu, ( $2BFC9FDC ) 4 256 29 sw, ( $2BFC9FE0 ) 3 jr, ( $2BFC9FE4 ) 1 1 0 or, $2BB725B8 ;s ok Compare this against the disassembly of CALL: see call: Code call ( $403C34 ) 3 16 0 addu, ( $403C38 ) 16 0 16 lw, ( $403C3C ) 2 18 0 addu, ( $403C40 ) 3 3 4 addiu, ( $403C44 ) 18 18 -4 addiu, ( $403C48 ) 16 16 4 addiu, ( $403C4C ) 3 -4 2 sw, ( $403C50 ) 2 -4 16 lw, ( $403C54 ) 3 2 0 addu, ( $403C58 ) 3 jr, ( $403C5C ) 1 1 0 or, end-code Instead of NEXT the code in test2 holds some nonsense, starting with invalid instruction $7C03E83B . Don't know why that instruction doesn't SIGILL, but maybe it's a non-standard/undocumented instruction on Loongson2f. The binutils also don't know anything about that opcode: echo -e "\x3b\xe8\x03\x7c" > /tmp/inst objdump -D -EL -b binary -m mips:loongson_2f /tmp/inst [..] 0: 7c03e83b 0x7c03e83b I double-checked that the objdump command above properly disassembles. It does. Also for FPU opcodes. This starts making sense. When benchmarking, the performance degradation was worst for code that contains a lot of non-primitives. That's why the RECURSE example is so telling, because it's dominated From the recursive non-primitive call. Onebench.fs confirms that theory: gforth-fast gforth/onebench.fs sieve bubble matrix fib fft 1.388 1.828 1.640 2.124 1.836 gforth-fast --dynamic gforth/onebench.fs sieve bubble matrix fib fft 1.880 2.228 2.660 10.776 5.792 The recursive 'fib' benchmark suffers worst (these results were obtained under load, so may not be very representative for Loongson2f). cheers, David PS: For reference, output of SEE-CODE for the recursion example from my last mail: --8<-- gforth-fast : b 1- DUP 0> IF RECURSE THEN ; see-code b $2BD76520 1- $2BD76524 dup $2BD76528 0> $2BD7652C ?branch $2BD76530 <735536444> $2BD76534 call $2BD76538 <b> $2BD7653C ;s ok --8<-- gforth-fast --dynamic : b 1- DUP 0> IF RECURSE THEN ; see-code b $2B176520 1- ( $2B5CDE84 ) 16 16 4 addiu, ( $2B5CDE88 ) 21 21 -1 addiu, $2B176524 noop ( $2B5CDE8C ) 21 0 17 sw, ( $2B5CDE90 ) 17 17 -4 addiu, ( $2B5CDE94 ) 21 4 17 lw, ( $2B5CDE98 ) 16 16 4 addiu, $2B176528 0> ( $2B5CDE9C ) 21 0 21 slt, ( $2B5CDEA0 ) 16 16 4 addiu, ( $2B5CDEA4 ) 21 0 21 subu, $2B17652C ?branch $2B176530 <722953532> ( $2B5CDEA8 ) 3 17 0 addu, ( $2B5CDEAC ) 2 0 16 lw, ( $2B5CDEB0 ) 21 0 28 bne, ( $2B5CDEB4 ) 17 17 4 addiu, ( $2B5CDEB8 ) 16 2 4 addiu, ( $2B5CDEBC ) 2 -4 16 lw, ( $2B5CDEC0 ) 21 4 3 lw, ( $2B5CDEC4 ) 3 2 0 addu, ( $2B5CDEC8 ) 3 jr, ( $2B5CDECC ) 1 1 0 or, ( $2B5CDED0 ) 21 4 3 lw, ( $2B5CDED4 ) 16 16 8 addiu, $2B176534 call $2B176538 <b> ( $2B5CDED8 ) 3 16 0 addu, ( $2B5CDEDC ) 16 0 16 lw, ( $2B5CDEE0 ) 2 18 0 addu, ( $2B5CDEE4 ) 3 3 4 addiu, ( $2B5CDEE8 ) 18 18 -4 addiu, ( $2B5CDEEC ) 16 16 4 addiu, ( $2B5CDEF0 ) 3 -4 2 sw, ( $2B5CDEF4 ) 2 -4 16 lw, ( $2B5CDEF8 ) $7C03E83B , ( illegal inst ) ( $2B5CDEFC ) 4 -32680 28 lw, ( $2B5CDF00 ) 30 3 0 addu, ( $2B5CDF04 ) 4 4 30 addu, ( $2B5CDF08 ) 3 2 0 addu, ( $2B5CDF0C ) 4 256 29 sw, ( $2B5CDF10 ) 3 jr, ( $2B5CDF14 ) 1 1 0 or, $2B17653C ;s ok --8<-- -- GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg Fingerprint: B63B 6AF2 4EEB F033 46F7 7F1D 935E 6F08 E457 205F
pgpqPS1KHodYj.pgp
Description: PGP signature