> On Mar 16, 2016, at 6:12 AM, Andrew Haley <a...@redhat.com> wrote:
The change got pushed before I had a chance to look at the revised version, but I wanted to get back to this part of the discussion. > On 15/03/16 19:15, Kim Barrett wrote: >>> On Mar 15, 2016, at 12:18 AM, Andrew Hughes <gnu.and...@redhat.com> wrote: >> >> I’ll probably have more to say later; just responding to one point here. >> >>>>> 2. A number of optimisations in GCC 6 lead to a broken JVM. We need to >>>>> add -fno-delete-null-pointer-checks and -fno-lifetime-dse to get a >>>>> working JVM. >>>> >>>> That's very disturbing! >>> >>> Andrew Haley (CCed) has more details on the need for these options, >>> as he diagnosed the reason for the crash, with the help of the GCC >>> developers. From what I understand of it, it is a case of more >>> aggressive optimisations in the new GCC running into problems with >>> code that doesn't strictly conform to the specification and exhibit >>> undefined behaviour. >> >> That is my suspicion too, though without more detail of the failures it’s >> hard to completely discount the possibility of compiler bugs. > > They weren't compiler bugs: I analyzed the code and I am sure that the > code in HotSpot isn't valid C++. The -fno-lifetime-dse is because we > write to a field of an object in operator new before the constructor. > This is in Node::operator new. I agree that Node::operator new invokes UB. It's also rather strange looking code. I was going to say that I'd be surprised if there were any more like it, but Hotspot code keeps surprising me. (Just out of curiosity I looked at a few operator new definitions (gosh, there are a lot of them), and quickly found a couple that on a brief look appear suspicious, though in different ways from Node.) >> This comment is quite worrisome. >> https://bugzilla.redhat.com/show_bug.cgi?id=1306558#c6 >> I very strongly suspect that -fno-strict-aliasing is broken in this >> version of GCC. >> >> Is that still thought to be a concern? > > No. I was wrong. That's what I'd guessed from later discussion in that thread, but thanks for confirming. > As I mentioned above, we dereference null pointers a lot. For > example, Register rax is defined as (RegisterImpl*)0. So, if we do > something like > > guarantee(reg->is_valid(), "must be"); > > if (reg == rax) > stuff... > > GCC is quite within its rights to delete the call to "stuff". And it > will. I can easily imagine we dereference pointers and later assert they are non-NULL. For example, I expect that can happen a lot where some function includes a non-NULL "sanity" check. I would expect that dereference followed by functional code for the NULL case would be more rare, though again not always incorrect. I was surprised by the assertion that we dereference NULL pointers a lot; I would have expected doing so to result in crashes. But then I looked at the register code you referred to. Yikes! My concern with disabling optimizations is two-fold. Of course, there's the possibility of slower code as a result. But there is also the problem of being outside the well-travelled lane of compiler configuration, with an associated chance of encountering previously unknown bugs in the compiler. I think that's particularly important for backporting, which is why I wasn't keen on applying these options with earlier compiler versions; doing so would be changing from configurations that have been in use for a while to something new. > As I said, I would very much like to clean this stuff up, but I'd need > support from the HotSpot team, and at the moment I feel that this is > lacking. I can't speak for the whole team, but I'm taking these kinds of issues seriously and will offer what support I can. I know we (including, perhaps especially, me) pushed back on the initial patch for 8145096, but that was just part of the review process, and I think the final changes were better as a result.