Re: [cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Chandler Carruth Fri, 05 Oct 2012 16:00:40 -0700

Chris told me to ping him on IRC, which indicated he did want a chance to
review this before commit, so I'm continuing to wait...


And to ping....


On Wed, Oct 3, 2012 at 9:10 PM, Daniel Dunbar <[email protected]>wrote:

> Just go ahead and put it in, that should be enough to get a response if
> one is forthcoming. :)
>
>  - Daniel
>
>
> On Oct 3, 2012, at 19:37, Chandler Carruth <[email protected]> wrote:
>
> Chris, pinging again. It's been 2 weeks since my last ping, and there
> since Daniel Dunbar handed it off to you. ;]
>
>
> On Thu, Sep 20, 2012 at 6:35 PM, Chandler Carruth <[email protected]>wrote:
>
>> Ping.
>>
>> I really want to move forward with this. We are observing real world
>> bugs, races, crashes, and other badness surrounding bitfields used across
>> threads. We cannot even distinguish between user error and compiler error
>> because the compiler is known to generate the wrong thing.
>>
>> I'd love to hear any feedback or suggestions for further work here.
>>
>>
>> On Fri, Sep 14, 2012 at 2:42 PM, Daniel Dunbar <[email protected]> wrote:
>>
>>> On Tue, Sep 11, 2012 at 11:02 AM, Chandler Carruth <[email protected]>
>>> wrote:
>>> > On Fri, Aug 31, 2012 at 11:46 AM, Daniel Dunbar <[email protected]>
>>> wrote:
>>> >>
>>> >> Hi Chandler,
>>> >>
>>> >> I haven't done an extensive review of the patch yet (by which I would
>>> >> mostly mean looking at the codegen, I trust you to verify the
>>> >> correctness), so treat this as initial comments.
>>> >
>>> >
>>> > Thanks for these. I'm attaching an updated patch with some fixes and
>>> > improvements, see below.
>>> >
>>> >>
>>> >> My one main initial comment is that I would like to treat the PR and
>>> >> the rewrite as distinct issues. Obviously fixing the PR is a
>>> >> no-brainer, but it would be nice to put in the minimal patch for that
>>> >> in the current code. I wouldn't expect this to be hard or to result in
>>> >> byte-by-byte accesses, but I haven't looked closely at what the change
>>> >> should be. It sounds however like you attempted this and decided it
>>> >> was hard enough to prefer a rewrite...
>>> >
>>> >
>>> > Yes. Both Richard Smith and I looked at it, and neither of us had any
>>> clear
>>> > ideas about how to adapt the existing code to avoid the problem without
>>> > creating  worse stores. As I explained in my previous email, the
>>> existing
>>> > strategy has two properties that make this hard: 1) it assumes it can
>>> read
>>> > before the bitfield in order to start off with an aligned load, and 2)
>>> it
>>> > wants to pre-split into strides which requires the frontend to
>>> implement
>>> > intelligent striding logic.
>>> >
>>> >>
>>> >> The one thing I am worried about is that the simplicity of your
>>> >> approach is contingent on us using irregular integer sized types. If
>>> >> we don't do that, I feel like the current version of the code is
>>> >> pretty good (feel free to disagree though), so really the question
>>> >> comes down to do we want to move to irregular sized integer types. At
>>> >> the time the code was written, it was a conscious decision not to do
>>> >> that. I agree with the general sentiment that SROA is already doing
>>> >> this, I am just worried that if we ever needed to break that we would
>>> >> be back to doing something like the current code is doing.
>>> >
>>> >
>>> > Yes, we may, some day need to go back and change things. But we'll
>>> have the
>>> > old code kicking around in history. This at least gets us to correct
>>> > behavior today and simplifies the code today.
>>> >
>>> >>
>>> >> I did a reasonable amount of looking at the codegen on bitfield test
>>> >> cases when I wrote the old code, but I probably did a poor job of
>>> >> turning these into test cases. I'm assuming you probably already did
>>> >> the same kind of testing on the new code? Do you have any test cases
>>> >> that demonstrate better or worse codegen from the new model?
>>> >
>>> >
>>> > Test cases I have looked at in depth are included. ;] Mostly, I
>>> carefully
>>> > checked the output of the existing bitfield tests. There are actually
>>> quite
>>> > a few. I added a test case to cover the correctness issue.
>>> >
>>> > I'll work on a few stress test cases based on the examples you and eli
>>> gave,
>>> > as well as some that I've come up with.
>>> >
>>> >
>>> >>
>>> >> I don't understand why this is true (the byte-by-byte part). If your
>>> >> rewrite is generating a wider load, then there is nothing in the
>>> >> structure of the old code that would prevent generating larger
>>> >> accesses as well. This seems independent of the structural change. It
>>> >> seems like implementing this in the current framework is a small
>>> >> amount of code.
>>> >
>>> >
>>> > Well, both Richard and I looked at it, and after I had tried a few
>>> times,
>>> > neither of us had really good ideas about how to make it a small
>>> amount of
>>> > code. To be fair, I didn't really *work* to find an elegant way to
>>> implement
>>> > it because after a few initial stabs and readings, I came to this
>>> > conclusion:
>>> >
>>> >>
>>> >> I agree that after we did that it might be very questionable why we
>>> >> were doing it in the frontend.
>>> >
>>> >
>>> > And once I was there, I wanted to do the right thing and solve this
>>> using
>>> > the facilities LLVM provides.
>>> >
>>> >>
>>> >> >> (1) How well-specified and -tested are the paths for non-power-of-2
>>> >> >> integer types in LLVM?  Do we actually promise that, say, storing
>>> to an
>>> >> >> i17
>>> >> >> will not touch the fourth byte?  Do we provide reasonably portable
>>> >> >> behavior
>>> >> >> across targets, or is this an instance of the x86 backend having
>>> >> >> received a
>>> >> >> lot of attention and everything else being a festering morass of
>>> latent
>>> >> >> issues?  I know that SROA's behavior has exposed bugs before (IIRC
>>> with
>>> >> >> the
>>> >> >> occasional response of making SROA less aggressive), not least of
>>> which
>>> >> >> because SROA's behavior doesn't really kick in a lot.
>>> >> >
>>> >> >
>>> >> > Sure, this is a great point. I'm happy to do any testing I can
>>> here. One
>>> >> > thing that I think I've already done is ensure that we only produce
>>> >> > i8-multiples, not i17s and such. I'll double check, but I would not
>>> >> > expect
>>> >> > to see any of those. My hope was that this would in the worst cased
>>> be
>>> >> > trivially legalized to a sequence of i8 operations, and in the best
>>> >> > case,
>>> >> > the optimal pattern.
>>> >>
>>> >> This is a concern of mine too. Of course, its also something we should
>>> >> just probably make sure is nicely defined in the backend given that
>>> >> SROA already uses it.
>>> >
>>> >
>>> > So I talked to Duncan specifically about this and this is actually the
>>> use
>>> > case he feels was intended originally and should be best supported.
>>> I've
>>> > done some LLVM-side testing and it seems to bear that out. Very
>>> reasonable
>>> > code was generated for integers in my testing.
>>>
>>> Ok, I have more or less been convinced this is a better approach.
>>>
>>> I think you should get explicit sign off from Chris though, because
>>> when I discussed it in person with him he had a very strong feeling
>>> that we did not want the FE to start generating off-size integer
>>> memory accesses (as opposed to off-sized integer values). I don't
>>> think I can clearly convey his argument, though, so I'd like to see
>>> him chime in.
>>>
>>>  - Daniel
>>>
>>> >
>>> >>
>>> >> I agree with the approach. I'm worried there might be cases where the
>>> >> backend does a worse job in codegen *today* on the off size integer
>>> >> types, but I could be wrong. And of course the right thing to do here
>>> >> is fix up the backend.
>>> >>
>>> >> Regardless, thanks for working in this area!
>>> >
>>> >
>>> > Well, I've fixed a tiny bug, and done significantly more correctness
>>> > testing. The patch attached passes self-host and the full testsuite.
>>> The
>>> > test suite shows no significant performance regressions.
>>> >
>>> > I've also added something to try to mitigate the performance concerns.
>>> I've
>>> > added logic to detect a case when there is guranteed to be padding
>>> bytes
>>> > after the bitfield and to widen the bitfield into the padding when
>>> doing so
>>> > produces more natural integer sizes.
>>>
>>
>>
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Reply via email to