Lauri Kasanen wrote:
On Sun, Mar 30, 2014, at 18:26, Ralf Friedl wrote:
What's even worse is that adding any output to push(), even a puts("hi")
that does not print the argument or any of the stack vars, fixes it. So
something magic is going on inside the GCC optimization, I'm afraid this
is above my pay grade.
Could you send the file miscutils/dc.o that is created with and without
this puts("hi") in push()?
Attached.
Are you using some special compiler options, especially regarding parameter passing in registers and stack alignment?

The relevant part of fail-dc.o is this:
00000000 <push>:
   0:   dd 07                   fldl   (%edi)
The function expects the value to push at the address pointed to by %edi. But the functions that call push pass the value at the top of the CPU stack (not to be confused with the stack dc implements).

The relevant part ofsuccess-dc.o is this:
00000000 <push>:
   0:   57                      push   %edi
   1:   8d 7c 24 08             lea    0x8(%esp),%edi
   5:   83 e4 f0                and    $0xfffffff0,%esp
   8:   ff 77 fc                pushl  -0x4(%edi)
   b:   55                      push   %ebp
   c:   89 e5                   mov    %esp,%ebp
   e:   57                      push   %edi
   f:   83 ec 14                sub    $0x14,%esp
  12:   dd 07                   fldl   (%edi)
These lines set up an aligned stack.
0: save %edi
1: put address of top of stack at the time the function was called in %edi. This is the address of the parameter.
5: align stack to 0x10 boundary
8: push return address of the function
b, c: normal frame setup
e: save %edi for later use
f: make space for a double and align to 0x10
12: load parameter, %edi still points to the address of the parameter.

The instruction at 12 loads the double from address %edi after %edi has been set to point to the parameter area. The instruction at 0 in the failed case is exactly the same, except that %edi has not been setup before. So I would consider this a compiler bug.

I wrote that the instruction at f makes space for an aligned double. This is itself is strange because later on the double that is loaded from %edi is saved on the CPU stack and later loaded from the CPU stack and saved in the dc stack, which is unnecessary. Also the double is always loaded to the FPU stack and then removed if bb_error_msg_and_die is called, instead of loading it only after it is clear that it will be used. So there is also opportunity for further optimization of the compiler.

This stack alignment makes your code bigger, and the additional instructions also have to be executed, which also takes time. I'm not sure whether the aligned stack saves enough time to offset this.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to