#4970: time002 and time004 (ghci) test failures on OS X 64 bit
-------------------------------+--------------------------------------------
    Reporter:  gwright         |       Owner:  gwright                    
        Type:  bug             |      Status:  new                        
    Priority:  normal          |   Component:  GHCi                       
     Version:  7.0.1           |    Keywords:                             
    Testcase:                  |   Blockedby:                             
          Os:  MacOS X         |    Blocking:                             
Architecture:  x86_64 (amd64)  |     Failure:  Incorrect result at runtime
-------------------------------+--------------------------------------------

Comment(by gwright):

 Surprise! Not a relocation failure at all! I don't understand why it only
 occurs for the `ghci` tests, but I am able to fix the bug.

 Here's what's going on:

 I used the linker debug output to get the memory address of the call to
 `__hscore_gettimeofday`.  Then I poked around with `gdb`'s `x/i` command
 to get the memory addresses of the instructions just before and after the
 call. I set breakpoints at these instructions.  (This may seem roundabout,
 but more on that later.)

 I ran the `test002` program a number of times, looking for where the time
 of day was returned from `__hscore_gettimeofday`.  I guessed that the
 return values might be somewhere pointed to by the STG machine's `R1`
 (`$rbx` on `amd64`), and it is:
 {{{
 *Main> main

 Breakpoint 3, 0x00000001061208d1 in ?? ()
 1: x/i $rip  0x1061208d1:       mov    $0x0,%eax
 (gdb) p8 $rbx
 0x105d28178:    0x302e302e322e302d
 0x105d28170:    0x6d6972702d636867
 0x105d28168:    0x10
 0x105d28160:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d28158:    0x8
 0x105d28150:    0x0
 0x105d28148:    0x10
 0x105d28140:    0x101f34340 <stg_ARR_WORDS_info>
 (gdb) c
 Continuing.

 Breakpoint 1, 0x00000001061208db in ?? ()
 1: x/i $rip  0x1061208db:       mov    %r14,-0x8(%rbp)
 (gdb) p8 $rbx
 0x105d28178:    0x302e302e322e302d
 0x105d28170:    0x6d6972702d636867
 0x105d28168:    0x10
 0x105d28160:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d28158:    0xe300b
 0x105d28150:    0x4d7118d0
 0x105d28148:    0x10
 0x105d28140:    0x101f34340 <stg_ARR_WORDS_info>
 (gdb)
 }}}
 The lines at `$rbx + 16` (0x105d28150) and `$rbx + 24` (0x105d28158) hold
 the `sec` and `usec` values, respectively.  These are the seconds and
 microseconds since the Unix epoch on 1 Jan 1970.  The current seconds
 count isn't obvious; I had to run `date +%s` in a shell, converted the
 value to hex then looked at the memory pointed to by various registers to
 find something starting with `0x4d7...`.

 I tried to step through `__hscore_gettimeofday` to see the time values
 being written, but it doesn't work.  The `_gettimeofday` system call loops
 endlessly calling `_nanotime`.  I suspect that there is a consistency
 check to make sure the time returned is current, even if the process is
 interrupted or put to sleep.  Single stepping interrupts each instruction,
 causing `_gettimeofday` to chase its tail.

 However, on OS X we have a lucky break.  `gdb` on this platform has
 hardware watchpoints, so I can set a watchpoint on the memory location
 that the `usec` value is written to, and it won't cause significant
 overhead.  Here's a transcript with the watchpoint set:
 {{{
 *Main> main

 Breakpoint 3, 0x00000001061208d1 in ?? ()
 1: x/i $rip  0x1061208d1:       mov    $0x0,%eax
 (gdb) p8 $rbx
 0x105d283f8:    0xc
 0x105d283f0:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d283e8:    0x8
 0x105d283e0:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d283d8:    0x322e302e33
 0x105d283d0:    0x2e302d7961727261
 0x105d283c8:    0x10
 0x105d283c0:    0x101f34340 <stg_ARR_WORDS_info>
 (gdb) watch * 0x105d283d8
 Hardware watchpoint 13: *4392649688
 (gdb) c
 Continuing.
 Hardware watchpoint 13: *4392649688

 Old value = 774909491
 New value = 735977
 0x00007fffffe00337 in __gettimeofday ()
 1: x/i $rip  0x7fffffe00337 <__gettimeofday+87>:        xor    %eax,%eax
 (gdb) c
 Continuing.

 Breakpoint 1, 0x00000001061208db in ?? ()
 1: x/i $rip  0x1061208db:       mov    %r14,-0x8(%rbp)
 (gdb) p8 $rbx
 0x105d283f8:    0xc
 0x105d283f0:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d283e8:    0x8
 0x105d283e0:    0x101f34340 <stg_ARR_WORDS_info>
 0x105d283d8:    0x32000b3ae9
 0x105d283d0:    0x4d700913
 0x105d283c8:    0x10
 0x105d283c0:    0x101f34340 <stg_ARR_WORDS_info>
 (gdb)
 }}}
 Here we see that a sensible value for microseconds is stored to
 `0x105d283d8`, 735977 (`0xb3ae9`), or about three quarters of a second
 (look at the "`New value =`" line).  But after `__hscore_gettimeofday`
 returns, memory location `0x105d283d8` contains `0x32000b3ae9`.  The low
 order 32 bits are correct, but the high order 32 bits contain junk.
 Continuing from this point gives the "`picoseconds out of range`" error.

 The cause is simple, and not at all related to the linker.  In the
 `getClockTime` function in `libraries/old-time/System/Time.hsc`,
 {{{
 #elif HAVE_GETTIMEOFDAY
 getClockTime = do
   allocaBytes (#const sizeof(struct timeval)) $ \ p_timeval -> do
     throwErrnoIfMinus1_ "getClockTime" $ gettimeofday p_timeval nullPtr
     sec  <- (#peek struct timeval,tv_sec)  p_timeval :: IO CTime
     usec <- (#peek struct timeval,tv_usec) p_timeval :: IO CTime
     return (TOD (realToInteger sec) ((realToInteger usec) * 1000000))
 }}}
 the `usec` value is forced to be a `CTime`, which has 64 bits.  But the
 `tv_usec` field in `struct timeval` is an `__int32_t` (see
 `/usr/include/sys/_types.h`).  The watchpoint showed that only 32 bits
 were transferred.  The fix is to change (on darwin) the type of `usec`
 from `CTime` to `CInt`.  The attached patch does this.

 After making this change, I've had no more failures of `test002` or
 `test004` in `ghci`.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4970#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to