#4970: time002 and time004 (ghci) test failures on OS X 64 bit
-------------------------------+--------------------------------------------
Reporter: gwright | Owner: gwright
Type: bug | Status: new
Priority: normal | Component: GHCi
Version: 7.0.1 | Keywords:
Testcase: | Blockedby:
Os: MacOS X | Blocking:
Architecture: x86_64 (amd64) | Failure: Incorrect result at runtime
-------------------------------+--------------------------------------------
Comment(by gwright):
Surprise! Not a relocation failure at all! I don't understand why it only
occurs for the `ghci` tests, but I am able to fix the bug.
Here's what's going on:
I used the linker debug output to get the memory address of the call to
`__hscore_gettimeofday`. Then I poked around with `gdb`'s `x/i` command
to get the memory addresses of the instructions just before and after the
call. I set breakpoints at these instructions. (This may seem roundabout,
but more on that later.)
I ran the `test002` program a number of times, looking for where the time
of day was returned from `__hscore_gettimeofday`. I guessed that the
return values might be somewhere pointed to by the STG machine's `R1`
(`$rbx` on `amd64`), and it is:
{{{
*Main> main
Breakpoint 3, 0x00000001061208d1 in ?? ()
1: x/i $rip 0x1061208d1: mov $0x0,%eax
(gdb) p8 $rbx
0x105d28178: 0x302e302e322e302d
0x105d28170: 0x6d6972702d636867
0x105d28168: 0x10
0x105d28160: 0x101f34340 <stg_ARR_WORDS_info>
0x105d28158: 0x8
0x105d28150: 0x0
0x105d28148: 0x10
0x105d28140: 0x101f34340 <stg_ARR_WORDS_info>
(gdb) c
Continuing.
Breakpoint 1, 0x00000001061208db in ?? ()
1: x/i $rip 0x1061208db: mov %r14,-0x8(%rbp)
(gdb) p8 $rbx
0x105d28178: 0x302e302e322e302d
0x105d28170: 0x6d6972702d636867
0x105d28168: 0x10
0x105d28160: 0x101f34340 <stg_ARR_WORDS_info>
0x105d28158: 0xe300b
0x105d28150: 0x4d7118d0
0x105d28148: 0x10
0x105d28140: 0x101f34340 <stg_ARR_WORDS_info>
(gdb)
}}}
The lines at `$rbx + 16` (0x105d28150) and `$rbx + 24` (0x105d28158) hold
the `sec` and `usec` values, respectively. These are the seconds and
microseconds since the Unix epoch on 1 Jan 1970. The current seconds
count isn't obvious; I had to run `date +%s` in a shell, converted the
value to hex then looked at the memory pointed to by various registers to
find something starting with `0x4d7...`.
I tried to step through `__hscore_gettimeofday` to see the time values
being written, but it doesn't work. The `_gettimeofday` system call loops
endlessly calling `_nanotime`. I suspect that there is a consistency
check to make sure the time returned is current, even if the process is
interrupted or put to sleep. Single stepping interrupts each instruction,
causing `_gettimeofday` to chase its tail.
However, on OS X we have a lucky break. `gdb` on this platform has
hardware watchpoints, so I can set a watchpoint on the memory location
that the `usec` value is written to, and it won't cause significant
overhead. Here's a transcript with the watchpoint set:
{{{
*Main> main
Breakpoint 3, 0x00000001061208d1 in ?? ()
1: x/i $rip 0x1061208d1: mov $0x0,%eax
(gdb) p8 $rbx
0x105d283f8: 0xc
0x105d283f0: 0x101f34340 <stg_ARR_WORDS_info>
0x105d283e8: 0x8
0x105d283e0: 0x101f34340 <stg_ARR_WORDS_info>
0x105d283d8: 0x322e302e33
0x105d283d0: 0x2e302d7961727261
0x105d283c8: 0x10
0x105d283c0: 0x101f34340 <stg_ARR_WORDS_info>
(gdb) watch * 0x105d283d8
Hardware watchpoint 13: *4392649688
(gdb) c
Continuing.
Hardware watchpoint 13: *4392649688
Old value = 774909491
New value = 735977
0x00007fffffe00337 in __gettimeofday ()
1: x/i $rip 0x7fffffe00337 <__gettimeofday+87>: xor %eax,%eax
(gdb) c
Continuing.
Breakpoint 1, 0x00000001061208db in ?? ()
1: x/i $rip 0x1061208db: mov %r14,-0x8(%rbp)
(gdb) p8 $rbx
0x105d283f8: 0xc
0x105d283f0: 0x101f34340 <stg_ARR_WORDS_info>
0x105d283e8: 0x8
0x105d283e0: 0x101f34340 <stg_ARR_WORDS_info>
0x105d283d8: 0x32000b3ae9
0x105d283d0: 0x4d700913
0x105d283c8: 0x10
0x105d283c0: 0x101f34340 <stg_ARR_WORDS_info>
(gdb)
}}}
Here we see that a sensible value for microseconds is stored to
`0x105d283d8`, 735977 (`0xb3ae9`), or about three quarters of a second
(look at the "`New value =`" line). But after `__hscore_gettimeofday`
returns, memory location `0x105d283d8` contains `0x32000b3ae9`. The low
order 32 bits are correct, but the high order 32 bits contain junk.
Continuing from this point gives the "`picoseconds out of range`" error.
The cause is simple, and not at all related to the linker. In the
`getClockTime` function in `libraries/old-time/System/Time.hsc`,
{{{
#elif HAVE_GETTIMEOFDAY
getClockTime = do
allocaBytes (#const sizeof(struct timeval)) $ \ p_timeval -> do
throwErrnoIfMinus1_ "getClockTime" $ gettimeofday p_timeval nullPtr
sec <- (#peek struct timeval,tv_sec) p_timeval :: IO CTime
usec <- (#peek struct timeval,tv_usec) p_timeval :: IO CTime
return (TOD (realToInteger sec) ((realToInteger usec) * 1000000))
}}}
the `usec` value is forced to be a `CTime`, which has 64 bits. But the
`tv_usec` field in `struct timeval` is an `__int32_t` (see
`/usr/include/sys/_types.h`). The watchpoint showed that only 32 bits
were transferred. The fix is to change (on darwin) the type of `usec`
from `CTime` to `CInt`. The attached patch does this.
After making this change, I've had no more failures of `test002` or
`test004` in `ghci`.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4970#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs