Thanks so much for the pointers, Ben. I opened a ticket here https://ghc.haskell.org/trac/ghc/ticket/15449
On Fri, Jul 27, 2018 at 6:51 AM, Ben Gamari <[email protected]> wrote: > Travis Whitaker <[email protected]> writes: > > > Hello GHC Devs, > > > > It seems to me that GHC is rather broken on aarch64, at least since 8.2.1 > > (and at least on the machines I have access to). I first noticed this > issue > > with Nixpkgs (https://github.com/NixOS/nixpkgs/issues/40301), so to > check > > that this isn't some Nixpkgs idiosyncrasy I went ahead and built my own > GHC > > 8.4.3 for aarch64 (there's no binary release at > > https://www.haskell.org/ghc/download_ghc_8_4_3.html to try, but perhaps > > I've missed something. > > > > It seems the only Nix idiosyncrasy was passing "--ghc-option=-j${cores}" > to > > "./Setup.hs configure". The issue is triggered by using '-jn' for any n > > greater than one when building any non-trivial package, but I've found > > hscolour1.24.4 reproduces it very reliably (perhaps because there are > > opportunities for parallelism early in its module dependency graph?). GHC > > very often (although not always) will fail with one of: > > > > - Segmentation fault. > > - Bus fault > > - <no location info>: error: > > ghc: panic! (the 'impossible' happened) > > (GHC version 8.4.3 for aarch64-unknown-linux): > > Binary.UserData: no put_binding_name > > > > - ghc: internal error: MUT_VAR_CLEAN object entered! > > (GHC version 8.4.3 for aarch64_unknown_linux) > > Please report this as a GHC bug: http://www.haskell.org/ghc/ > reportabug > > Aborted (core dumped) > > > Ugh, that is awful. > > > The fix, excruciating as it may be on already slow arm machines, is to > use > > '-j1'. This issue seems present on each GHC release since 8.2.1 > (although I > > haven't tried HEAD yet). I haven't noticed any issues with any other > > concurrent Haskell programs on aarch64. > > > > There are some umbrella bugs for aarch64 in Trac, so I wanted to ask here > > before filing a ticket. Has anyone else noticed this behavior on aarch64? > > What's more, are there any tips for using GDB to hunt down > synchronization > > issues in GHC? > > > Definitely open a new ticket. > > The methodology for tracking down issues like this is quite > case-specific but I do have some general recommendations: On x86-64 I > use rr [1], which is an invaluable tool. Sadly this isn't an option on > AArch64 AFAIK. I also have some gdb extensions to take much of the > monotony away from inspecting GHC's heap and internal data structures > [2]. I've not used them on AArch64 so there may be a few compatibility > issues but I suspect they wouldn't be hard to fix. > > I know it may be hard in this case but I would at least try to reduce > the size of the failing program to something that fits in less than a > few hundred lines. Low-level debugging is hard enough when you can keep > the program in your head; debugging all of GHC this way is possible but > much harder. Given that this appears to be threading-specific, I would > also pay particular attention to the GHC and base's use of barriers and > atomics. It's possible that we are just missing a barrier somewhere. > > Finally, you might quickly try building 8.0 to see whether bisection is > a possibility. It would be a slow process, given the speed of the > hardware involved, but ultimately it can be much more time efficient > once you have it setup since you can replace human debugging time (a > very finite commodity) with computation. > > Good luck and let us know if you get stuck, > > - Ben > > > [1] http://rr-project.org/ > [2] https://github.com/bgamari/ghc-utils/tree/master/gdb >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
