[PATCH] D154784: [clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow

Yurong via Phabricator via cfe-commits Wed, 30 Aug 2023 10:36:31 -0700

yronglin marked 2 inline comments as done.
yronglin added inline comments.


================
Comment at: clang/include/clang/AST/Stmt.h:596-597
 
-    // These don't need to be particularly wide, because they're
-    // strictly limited by the forms of expressions we permit.
-    unsigned NumSubExprs : 8;
-    unsigned ResultIndex : 32 - 8 - NumExprBits;
+    unsigned NumSubExprs : 16;
+    unsigned ResultIndex : 16;
   };
----------------
dblaikie wrote:
> aaron.ballman wrote:
> > yronglin wrote:
> > > aaron.ballman wrote:
> > > > dblaikie wrote:
> > > > > aaron.ballman wrote:
> > > > > > yronglin wrote:
> > > > > > > yronglin wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > yronglin wrote:
> > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > yronglin wrote:
> > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > > > > > > > Could/should we add some error checking in 
> > > > > > > > > > > > > > > > > the ctor to assert that we don't overflow 
> > > > > > > > > > > > > > > > > these longer values/just hit the bug later on?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > (& could we use `unsigned short` here rather 
> > > > > > > > > > > > > > > > > than bitfields?)
> > > > > > > > > > > > > > > > We've already got them packed in with other 
> > > > > > > > > > > > > > > > bit-fields from the expression bits, so I think 
> > > > > > > > > > > > > > > > it's reasonable to continue the pattern of 
> > > > > > > > > > > > > > > > using bit-fields (that way we don't 
> > > > > > > > > > > > > > > > accidentally end up with padding between the 
> > > > > > > > > > > > > > > > unnamed bits at the start and the named bits in 
> > > > > > > > > > > > > > > > this object).
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I think adding some assertions would not be a 
> > > > > > > > > > > > > > > > bad idea as a follow-up.
> > > > > > > > > > > > > > > Maybe some unconditional (rather than only in 
> > > > > > > > > > > > > > > asserts builds) error handling? 
> > > > > > > > > > > > > > > (report_fatal_error, if this is low priority 
> > > > > > > > > > > > > > > enough to not have an elegant failure mode, but 
> > > > > > > > > > > > > > > something where we don't just overflow and carry 
> > > > > > > > > > > > > > > on would be good... )
> > > > > > > > > > > > > > Ping on this? I worry this code has just punted the 
> > > > > > > > > > > > > > same bug further down, but not plugged the 
> > > > > > > > > > > > > > hole/ensured we don't overflow on novel/larger 
> > > > > > > > > > > > > > inputs.
> > > > > > > > > > > > > Sorry for the late reply, I was looking through the 
> > > > > > > > > > > > > emails and found this. I agree add some assertions to 
> > > > > > > > > > > > > check the value is a good idea, It's easy to help 
> > > > > > > > > > > > > people catch bugs, at least with when 
> > > > > > > > > > > > > `-DLLVM_ENABLE_ASSERTIONS=ON`, and I'm glad to work 
> > > > > > > > > > > > > on it, but one thing that worries me is that, in 
> > > > > > > > > > > > > ASTReader, we access this field directly, not through 
> > > > > > > > > > > > > the constructor or accessor, and we have to add 
> > > > > > > > > > > > > assertions everywhere. 
> > > > > > > > > > > > > https://github.com/llvm/llvm-project/blob/05b4310c8aec7a050574277ced08a0ab86b27681/clang/lib/Serialization/ASTReaderStmt.cpp#L1382
> > > > > > > > > > > > I don't think we have to add too many assertions. As 
> > > > > > > > > > > > best I can tell, we'll need one in each of the 
> > > > > > > > > > > > `PseudoObjectExpr` constructors and one in 
> > > > > > > > > > > > `ASTStmtReader::VisitPseudoObjectExpr()`, but those are 
> > > > > > > > > > > > the only places we assign a value into the bit-field. 
> > > > > > > > > > > > Three assertions isn't a lot, but if we're worried, we 
> > > > > > > > > > > > could add a setter method that does the assertion and 
> > > > > > > > > > > > use the setter in all three places.
> > > > > > > > > > > My concern wasn't (well, wasn't entirely) about adding 
> > > > > > > > > > > more assertions - but about having a reliable error here. 
> > > > > > > > > > > The patch only makes the sizes larger, but doesn't have a 
> > > > > > > > > > > hard-stop in case those sizes are exceeded again (which, 
> > > > > > > > > > > admittedly, is much harder to do - maybe it's totally 
> > > > > > > > > > > unreachable now, for all practical purposes?) 
> > > > > > > > > > > 
> > > > > > > > > > > I suspect with more carefully constructed recursive 
> > > > > > > > > > > inputs could still reach the higher limit & I think it'd 
> > > > > > > > > > > be good to fail hard in that case in some way? (it's 
> > > > > > > > > > > probably rare enough that a report_fatal_error would be 
> > > > > > > > > > > not-the-worst-thing-ever)
> > > > > > > > > > > 
> > > > > > > > > > > But good assertions would be nice too (the old code only 
> > > > > > > > > > > failed when you hit /exactly/ on just the overflow value, 
> > > > > > > > > > > and any more than that the wraparound would not 
> > > > > > > > > > > crash/fail, but misbehave) - I did add the necessary 
> > > > > > > > > > > assertion to ArrayRef (begin <= end) which would've 
> > > > > > > > > > > helped detect this more reliably, but some assert 
> > > > > > > > > > > checking for overflow in the ctor would be good too (with 
> > > > > > > > > > > all the usual nuance/care in checking for overflow) - 
> > > > > > > > > > > unless we're going to make that into a fatal or other 
> > > > > > > > > > > real error.
> > > > > > > > > > Sorry for the very late reply. I have no preference between 
> > > > > > > > > > assertion and `llvm_unreachable`, if error then fail fast 
> > > > > > > > > > is looks good. I have a patch D158296 to add assertion.
> > > > > > > > > Thanks for the assertions - though they still haven't met my 
> > > > > > > > > main concern that this should have a hard failure even in a 
> > > > > > > > > non-assertions build.
> > > > > > > > > 
> > > > > > > > > I know we don't have a perfect plan/policy for these sort of 
> > > > > > > > > "run out of resources/hit a representational limit" issues 
> > > > > > > > > (at least I don't think we do... do we, @aaron.ballman ? I 
> > > > > > > > > know we have some limits (recursion, template expansion, etc) 
> > > > > > > > > but they're fairly specific/aren't about every possible case 
> > > > > > > > > of integer overflow in some representational element, etc) 
> > > > > > > > > but we've seen this one is pretty reachable. 
> > > > > > > > > 
> > > > > > > > > Here's a test case that would still trigger the assertion, 
> > > > > > > > > and trigger UB in a non-assertions build:
> > > > > > > > > ```
> > > > > > > > > truct t1 { };
> > > > > > > > > template<typename T1>
> > > > > > > > > struct templ {
> > > > > > > > >     T1 v1;
> > > > > > > > >     T1 v2;
> > > > > > > > >     T1 v3;
> > > > > > > > >     T1 v4;
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > struct t2 {
> > > > > > > > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c0;
> > > > > > > > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c1;
> > > > > > > > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c2;
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > void aj(...);
> > > > > > > > > void f1(t2 w) { __builtin_dump_struct(&w, aj); }
> > > > > > > > > ```
> > > > > > > > > (used templates to pack this a bit more densely than the 
> > > > > > > > > original test case) - the `sizeof` the struct is certainly a 
> > > > > > > > > bit outlandish (~12kbytes) bit not, I think, totally 
> > > > > > > > > unreasonable?
> > > > > > > > Thanks for your example. I have three ways:
> > > > > > > > 1. use `llvm_unreachable` to emit a hard failure but not an 
> > > > > > > > assertion.
> > > > > > > > 2. extend these two field to 32-bit unsigned, it's may big 
> > > > > > > > enough.
> > > > > > > > 3. limit the functionality of `__builtin_dump_struct`, if there 
> > > > > > > > are too many fields in a struct, the part exceeding the limit 
> > > > > > > > will not be output, and replaced with `...`(maybe). 
> > > > > > > > 
> > > > > > > > WDYT? You guys are expert in clang, and I would like to wait 
> > > > > > > > for your guidance :)
> > > > > > > We have ever limit the length of c-string value in 
> > > > > > > `__builtin_dump_struct`. 
> > > > > > > https://github.com/llvm/llvm-project/blob/5675f44cebf602931e884595c1d488bcedf6b8f2/clang/lib/Sema/SemaChecking.cpp#L524-L527
> > > > > > > I know we don't have a perfect plan/policy for these sort of "run 
> > > > > > > out of resources/hit a representational limit" issues (at least I 
> > > > > > > don't think we do... do we, @aaron.ballman ? I know we have some 
> > > > > > > limits (recursion, template expansion, etc) but they're fairly 
> > > > > > > specific/aren't about every possible case of integer overflow in 
> > > > > > > some representational element, etc) but we've seen this one is 
> > > > > > > pretty reachable.
> > > > > > 
> > > > > > Correct, we don't have a general mechanism for handling resource 
> > > > > > limits, we mostly play whack-a-mole when we run into them. So 
> > > > > > having a general utility that can work for other bit-fields would 
> > > > > > be really nice. However, `__builtin_dump_struct` is a debugging 
> > > > > > interface and not something we expect users to call particularly 
> > > > > > often, so some sharp edges with the interface are not the end of 
> > > > > > the world IMO.
> > > > > > 
> > > > > > > limit the functionality of __builtin_dump_struct, if there are 
> > > > > > > too many fields in a struct, the part exceeding the limit will 
> > > > > > > not be output, and replaced with ...(maybe).
> > > > > > 
> > > > > > I think this is basically what @dblaikie was asking for -- if there 
> > > > > > are too many fields in the structure, either give a diagnostic that 
> > > > > > the structure is too complex for us to dump, or cut off the output 
> > > > > > somewhere sensible, etc. @dblaikie, do you have a preference for 
> > > > > > diagnostic vs prettier output? Ideally, I lean towards prettier 
> > > > > > output, but at the same time, that might be a bigger ask than what 
> > > > > > you were after.
> > > > > > 
> > > > > Yeah, I don't have major/heroic requirements here - just that we 
> > > > > don't crash on these inputs. An error, wider integers, both, etc, 
> > > > > it's all good to me.
> > > > > 
> > > > > just an error seems fine to me - as you say, it's basically a 
> > > > > debugging interface, people shouldn't be making this some 
> > > > > loadbearing/totally generic part of an API or somesuch.
> > > > Thank you! I'm also fine with going with just an error.
> > > I'm fine too! But I have some concerns, the BuiltinDumpStructGenerator 
> > > create call of print functions. but the call of print functions not only 
> > > for field, but also for `{` and '}',  so I have no idea to describe the 
> > > problem in the diagnostic message, 'too many fields in struct' doesn't 
> > > feel right to me. Do you have any other good idea? I'd be happy to 
> > > continue cook D158296
> > Hmm, I think I'd be fine with something along the lines of: 
> > `class|struct|union 'blah' is too complex to dump` and leave "too complex" 
> > vague (because this is a debugging interface and it involves some quite 
> > questionably-sized objects, I think it's fine for the QoI bar to be lower 
> > for this diagnostic). WDYT?
> "too complex" seems like totally the right phrasing to me
Thanks for your suggestions, I have updated D158296


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154784/new/

https://reviews.llvm.org/D154784

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154784: [clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow

Reply via email to