Lauri Kasanen wrote:
On Sun, Mar 30, 2014, at 18:26, Ralf Friedl wrote:
What's even worse is that adding any output to push(), even a puts("hi")
that does not print the argument or any of the stack vars, fixes it. So
something magic is going on inside the GCC optimization, I'm afraid this
is above my pay grade.
Could you send the file miscutils/dc.o that is created with and without
this puts("hi") in push()?
Attached.
Are you using some special compiler options, especially regarding
parameter passing in registers and stack alignment?
The relevant part of fail-dc.o is this:
00000000 <push>:
0: dd 07 fldl (%edi)
The function expects the value to push at the address pointed to by
%edi. But the functions that call push pass the value at the top of the
CPU stack (not to be confused with the stack dc implements).
The relevant part ofsuccess-dc.o is this:
00000000 <push>:
0: 57 push %edi
1: 8d 7c 24 08 lea 0x8(%esp),%edi
5: 83 e4 f0 and $0xfffffff0,%esp
8: ff 77 fc pushl -0x4(%edi)
b: 55 push %ebp
c: 89 e5 mov %esp,%ebp
e: 57 push %edi
f: 83 ec 14 sub $0x14,%esp
12: dd 07 fldl (%edi)
These lines set up an aligned stack.
0: save %edi
1: put address of top of stack at the time the function was called in
%edi. This is the address of the parameter.
5: align stack to 0x10 boundary
8: push return address of the function
b, c: normal frame setup
e: save %edi for later use
f: make space for a double and align to 0x10
12: load parameter, %edi still points to the address of the parameter.
The instruction at 12 loads the double from address %edi after %edi has
been set to point to the parameter area. The instruction at 0 in the
failed case is exactly the same, except that %edi has not been setup
before. So I would consider this a compiler bug.
I wrote that the instruction at f makes space for an aligned double.
This is itself is strange because later on the double that is loaded
from %edi is saved on the CPU stack and later loaded from the CPU stack
and saved in the dc stack, which is unnecessary. Also the double is
always loaded to the FPU stack and then removed if bb_error_msg_and_die
is called, instead of loading it only after it is clear that it will be
used. So there is also opportunity for further optimization of the compiler.
This stack alignment makes your code bigger, and the additional
instructions also have to be executed, which also takes time. I'm not
sure whether the aligned stack saves enough time to offset this.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox