[PATCH] D154784: [clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow

Aaron Ballman via Phabricator via cfe-commits Mon, 21 Aug 2023 09:20:15 -0700

aaron.ballman added inline comments.


================
Comment at: clang/include/clang/AST/Stmt.h:596-597
 
-    // These don't need to be particularly wide, because they're
-    // strictly limited by the forms of expressions we permit.
-    unsigned NumSubExprs : 8;
-    unsigned ResultIndex : 32 - 8 - NumExprBits;
+    unsigned NumSubExprs : 16;
+    unsigned ResultIndex : 16;
   };
----------------
yronglin wrote:
> yronglin wrote:
> > dblaikie wrote:
> > > yronglin wrote:
> > > > dblaikie wrote:
> > > > > aaron.ballman wrote:
> > > > > > yronglin wrote:
> > > > > > > dblaikie wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > dblaikie wrote:
> > > > > > > > > > > Could/should we add some error checking in the ctor to 
> > > > > > > > > > > assert that we don't overflow these longer values/just 
> > > > > > > > > > > hit the bug later on?
> > > > > > > > > > > 
> > > > > > > > > > > (& could we use `unsigned short` here rather than 
> > > > > > > > > > > bitfields?)
> > > > > > > > > > We've already got them packed in with other bit-fields from 
> > > > > > > > > > the expression bits, so I think it's reasonable to continue 
> > > > > > > > > > the pattern of using bit-fields (that way we don't 
> > > > > > > > > > accidentally end up with padding between the unnamed bits 
> > > > > > > > > > at the start and the named bits in this object).
> > > > > > > > > > 
> > > > > > > > > > I think adding some assertions would not be a bad idea as a 
> > > > > > > > > > follow-up.
> > > > > > > > > Maybe some unconditional (rather than only in asserts builds) 
> > > > > > > > > error handling? (report_fatal_error, if this is low priority 
> > > > > > > > > enough to not have an elegant failure mode, but something 
> > > > > > > > > where we don't just overflow and carry on would be good... )
> > > > > > > > Ping on this? I worry this code has just punted the same bug 
> > > > > > > > further down, but not plugged the hole/ensured we don't 
> > > > > > > > overflow on novel/larger inputs.
> > > > > > > Sorry for the late reply, I was looking through the emails and 
> > > > > > > found this. I agree add some assertions to check the value is a 
> > > > > > > good idea, It's easy to help people catch bugs, at least with 
> > > > > > > when `-DLLVM_ENABLE_ASSERTIONS=ON`, and I'm glad to work on it, 
> > > > > > > but one thing that worries me is that, in ASTReader, we access 
> > > > > > > this field directly, not through the constructor or accessor, and 
> > > > > > > we have to add assertions everywhere. 
> > > > > > > https://github.com/llvm/llvm-project/blob/05b4310c8aec7a050574277ced08a0ab86b27681/clang/lib/Serialization/ASTReaderStmt.cpp#L1382
> > > > > > I don't think we have to add too many assertions. As best I can 
> > > > > > tell, we'll need one in each of the `PseudoObjectExpr` constructors 
> > > > > > and one in `ASTStmtReader::VisitPseudoObjectExpr()`, but those are 
> > > > > > the only places we assign a value into the bit-field. Three 
> > > > > > assertions isn't a lot, but if we're worried, we could add a setter 
> > > > > > method that does the assertion and use the setter in all three 
> > > > > > places.
> > > > > My concern wasn't (well, wasn't entirely) about adding more 
> > > > > assertions - but about having a reliable error here. The patch only 
> > > > > makes the sizes larger, but doesn't have a hard-stop in case those 
> > > > > sizes are exceeded again (which, admittedly, is much harder to do - 
> > > > > maybe it's totally unreachable now, for all practical purposes?) 
> > > > > 
> > > > > I suspect with more carefully constructed recursive inputs could 
> > > > > still reach the higher limit & I think it'd be good to fail hard in 
> > > > > that case in some way? (it's probably rare enough that a 
> > > > > report_fatal_error would be not-the-worst-thing-ever)
> > > > > 
> > > > > But good assertions would be nice too (the old code only failed when 
> > > > > you hit /exactly/ on just the overflow value, and any more than that 
> > > > > the wraparound would not crash/fail, but misbehave) - I did add the 
> > > > > necessary assertion to ArrayRef (begin <= end) which would've helped 
> > > > > detect this more reliably, but some assert checking for overflow in 
> > > > > the ctor would be good too (with all the usual nuance/care in 
> > > > > checking for overflow) - unless we're going to make that into a fatal 
> > > > > or other real error.
> > > > Sorry for the very late reply. I have no preference between assertion 
> > > > and `llvm_unreachable`, if error then fail fast is looks good. I have a 
> > > > patch D158296 to add assertion.
> > > Thanks for the assertions - though they still haven't met my main concern 
> > > that this should have a hard failure even in a non-assertions build.
> > > 
> > > I know we don't have a perfect plan/policy for these sort of "run out of 
> > > resources/hit a representational limit" issues (at least I don't think we 
> > > do... do we, @aaron.ballman ? I know we have some limits (recursion, 
> > > template expansion, etc) but they're fairly specific/aren't about every 
> > > possible case of integer overflow in some representational element, etc) 
> > > but we've seen this one is pretty reachable. 
> > > 
> > > Here's a test case that would still trigger the assertion, and trigger UB 
> > > in a non-assertions build:
> > > ```
> > > truct t1 { };
> > > template<typename T1>
> > > struct templ {
> > >     T1 v1;
> > >     T1 v2;
> > >     T1 v3;
> > >     T1 v4;
> > > };
> > > 
> > > struct t2 {
> > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c0;
> > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c1;
> > >   templ<templ<templ<templ<templ<templ<t1>>>>>> c2;
> > > };
> > > 
> > > void aj(...);
> > > void f1(t2 w) { __builtin_dump_struct(&w, aj); }
> > > ```
> > > (used templates to pack this a bit more densely than the original test 
> > > case) - the `sizeof` the struct is certainly a bit outlandish (~12kbytes) 
> > > bit not, I think, totally unreasonable?
> > Thanks for your example. I have three ways:
> > 1. use `llvm_unreachable` to emit a hard failure but not an assertion.
> > 2. extend these two field to 32-bit unsigned, it's may big enough.
> > 3. limit the functionality of `__builtin_dump_struct`, if there are too 
> > many fields in a struct, the part exceeding the limit will not be output, 
> > and replaced with `...`(maybe). 
> > 
> > WDYT? You guys are expert in clang, and I would like to wait for your 
> > guidance :)
> We have ever limit the length of c-string value in `__builtin_dump_struct`. 
> https://github.com/llvm/llvm-project/blob/5675f44cebf602931e884595c1d488bcedf6b8f2/clang/lib/Sema/SemaChecking.cpp#L524-L527
> I know we don't have a perfect plan/policy for these sort of "run out of 
> resources/hit a representational limit" issues (at least I don't think we 
> do... do we, @aaron.ballman ? I know we have some limits (recursion, template 
> expansion, etc) but they're fairly specific/aren't about every possible case 
> of integer overflow in some representational element, etc) but we've seen 
> this one is pretty reachable.

Correct, we don't have a general mechanism for handling resource limits, we 
mostly play whack-a-mole when we run into them. So having a general utility 
that can work for other bit-fields would be really nice. However, 
`__builtin_dump_struct` is a debugging interface and not something we expect 
users to call particularly often, so some sharp edges with the interface are 
not the end of the world IMO.

> limit the functionality of __builtin_dump_struct, if there are too many 
> fields in a struct, the part exceeding the limit will not be output, and 
> replaced with ...(maybe).

I think this is basically what @dblaikie was asking for -- if there are too 
many fields in the structure, either give a diagnostic that the structure is 
too complex for us to dump, or cut off the output somewhere sensible, etc. 
@dblaikie, do you have a preference for diagnostic vs prettier output? Ideally, 
I lean towards prettier output, but at the same time, that might be a bigger 
ask than what you were after.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154784/new/

https://reviews.llvm.org/D154784

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154784: [clang] Fix crash caused by PseudoObjectExprBitfields::NumSubExprs overflow

Reply via email to