donat.nagy planned changes to this revision.
donat.nagy marked 5 inline comments as done.
donat.nagy added a comment.

I'll soon upload a refactored version.



================
Comment at: clang/docs/analyzer/checkers.rst:44-50
+Moreover, if the pedantic mode is activated by
+``-analyzer-config core.BitwiseShift:Pedantic=true``, then this checker also
+reports situations where the _left_ operand of a shift operator is negative or
+overflow occurs during the right shift of a signed value. (Most compilers
+handle these predictably, but the C standard and the C++ standards before C++20
+say that they're undefined behavior. In the C++20 standard these constructs are
+well-defined, so activating pedantic mode in C++20 has no effect.)
----------------
steakhal wrote:
> I believe shifting out the "sign bit" is UB because the representation of the 
> signed integer type was not defined.
> Prior C++20 (?), it was allowed to represent signed integer types as 
> `two's-complement` (virtually every platform uses this), `one's complement`, 
> `sign-magnitude`. [[ 
> https://en.wikipedia.org/wiki/Signed_number_representations | Wiki ]].
> And it turns out, all these three representations behave differently with 
> negative values.
> Check out [[ https://youtu.be/JhUxIVf1qok?t=2089 | JF's talk ]] from 2018 
> about this.
> 
> I'd need to think about the relevance of C++20 in this context, but here is 
> what I think of now:
> My idea is that the observed behavior is not influenced by the "standard", 
> rather by the time we released C++20, they realized that no actual platform 
> uses weird signed integer representations.
> Given this observation, I'd argue that we shouldn't have this flag at all.
Thanks for the background information! I knew the rough outlines of the 
situation, but your links provided interesting details.

I agree that the non-pedantic mode is the "sane mode" that reflects the real 
behavior of the hardware; but the standards (with the exception of C++20) 
explicitly declare that signed shift overflow and shifting negative values is 
UB and there may be secondary requirements like [[ 
https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow
 | SEI-CERT rule INT32-C  ]] that demand compliance to the standards.

I don't think that I'd spend time on implementing pedantic mode because there 
are more important things to do, but as we already have this feature, I think 
we should release it as an off-by-default option.



================
Comment at: clang/docs/analyzer/checkers.rst:81
+
+Ensure the shift operands are in proper range before shifting.
+
----------------
steakhal wrote:
> We should exactly elaborate on what "proper" means here.
What would you suggest? I could try to write a slightly longer suggestion, but 
I don't want to repeat the same conditions that I listed above the code example.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:142
+                                              BinaryOperator::Opcode 
Comparison,
+                                              long long Limit) {
+  SValBuilder &SVB = Ctx.getSValBuilder();
----------------
steakhal wrote:
> One does not frequently see the `long long` type.
> Do you have a specific reason why `int` or `unsigned` wouldn't be good?
I inherited this type choice from old code and didn't think much about it. My 
guess is that `LimitVal` needs to be a signed integer (because e.g. `-1 < 0u` 
would be evaluated to false) and it's a `long long` because that's guaranteed 
to represent all `unsigned` values. (However, the actual values of `Limit` are 
very small.)

I'll probably use `int` for `Limit` and `LimitVal` with some comments to 
clarify the reason for this type choice.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:152-156
+    if (!StTrue) {
+      // We detected undefined behavior (the caller will report it).
+      State = StFalse;
+      return false;
+    }
----------------
steakhal wrote:
> Generally, I don't favor mutating member functions.
> It makes it harder to track how we ended up with a given State.
> 
I also agree that mutating member functions are often problematic and may make 
the code more difficult to understand.

However, in this particular case I think it's important to ensure that we're 
always using the most recent state, and that's difficult to guarantee in the 
"pass everything as arguments and return values" style. Moreover, I do some 
parallel bookkeeping because I want to summarize the state changes in a single 
note tag, and that would make the "naive functional" solution even more 
convoluted.

In a functional language like Haskell I'd use computations in a [[ 
https://en.wikibooks.org/wiki/Haskell/Understanding_monads/State | State monad 
]] to implement this logic; my C++ code is a direct translation of that 
solution. (As a secondary utility, my class also provides read-only access to 
some context like a Reader monad, because functions that take 5-8 arguments are 
also very difficult to understand. This way, the relevant arguments are 
highlighted as actual arguments, while the shared context is described by the 
`const` data members.)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302
+                      pluralSuffix(MaximalAllowedShift));
+    R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(),
+                                                Ctx.getLocationContext()});
+    Ctx.emitReport(std::move(R));
----------------
donat.nagy wrote:
> gamesh411 wrote:
> > donat.nagy wrote:
> > > NoQ wrote:
> > > > Can we just append this to the warning? The `addNote()` is useful for 
> > > > notes that need to be placed in code outside of the execution path, but 
> > > > if it's always next to the warning, it probably doesn't make sense to 
> > > > display it separately.
> > > I didn't append this to the warning because I felt that the warning text 
> > > is already too long. (By the way, an earlier internal variant of this 
> > > checker produced two separate notes next to the warning message.)
> > > 
> > > I tweaked the messages of this checker before initiating this review, but 
> > > I feel that there's still some room for improvements. (E.g. in this 
> > > particular case perhaps we could omit some of the details that are not 
> > > immediately useful and could be manually calculated from other parts of 
> > > the message...) 
> > Just a thought on simplifying the diagnostics a bit:
> > 
> > Warning: "Right operand is negative in left shift"
> > Note: "The result of left shift is undefined because the right operand is 
> > negative"
> > Shortened: "Undefined left shift due to negative right operand"
> > 
> > Warning: "Left shift by '34' overflows the capacity of 'int'"
> > Note: "The result of left shift is undefined because the right operand '34' 
> > is not smaller than 32, the capacity of 'int'"
> > Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 bits)"
> > 
> > The more complex notes are a bit sketchy, as relevant information can be 
> > lost, and the following solution is probably a bit too much, but could 
> > prove to be an inspiration:
> > 
> > Warning: "Left shift of '1024' overflows the capacity of 'int'"
> > CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 
> > bits (including the sign bit), so some bits overflow"
> > CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 
> > bits for bitshift"
> > C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 
> > bits (excluding the sign bit), so some bits overflow"
> > C Note: "The value '1024' is represented by 11 bits, allowing at most 20 
> > bits for bitshift"
> > 
> > Shortened:
> > CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity 
> > (32 bits, including sign)"
> > C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' capacity 
> > (31 bits, excluding sign)"
> Clarification about the `Msg`/`ShortMsg` distinction:
> I'm intentionally separating the short warning messages and the longer note 
> messages because `createBugReport()` enforces the convention that it will 
> always emit a warning and a note at the bug location.
> 
> According to the comments in the source code, the intention is that the note 
> contains all the relevant information, while the warning is a brief summary 
> that can be displayed in situations where the notes wouldn't fit the UI.
> 
> IIUC many checkers ignore this distinction and emit the same (often long and 
> cumbersome) message both as a note and as a warning (`createBugReport()` has 
> a variant which takes only one message parameter and passes it to both 
> locations), but I tried to follow it because I think it's ugly when the same 
> message is repeated twice and there is some sense in providing both a brief 
> summary and a full description that doesn't use potentially-ambiguous 
> abbreviations to save space.
> 
> Of course I could also accept a community decision that this "brief warning / 
> full note" distinction is deprecated and will be eliminated (because e.g. 
> front-end utilities are not prepared to handle it), but in that case I'd 
> strongly suggest a redesign where we eliminate the redundantly printed 'note' 
> message. (There is no reason to say the same thing twice! There is no reason 
> to say the same thing twice!)
> 
> On the other hand, in addition to this `Msg`/`ShortMsg` distinction, this 
> part of the code also adds the extra `LeftNote` (as a remnant from an earlier 
> internal version of this checker), and that's that's what I'd like to merge 
> into `Msg` (following NoQ's suggestion and keeping it distinct from the 
> `ShortMsg`).
Among notes, my only planned change is that instead of 

> Warning: "Left shift of '1024' overflows the capacity of 'int'"
> CXX Note: "Left shift of '1024' is undefined because 'int' can hold only 32 
> bits (including the sign bit), so some bits overflow"
> CXX Note: "The value '1024' is represented by 11 bits, allowing at most 21 
> bits for bitshift"
> C Note: "Left shift of '1024' is undefined because 'int' can hold only 31 
> bits (excluding the sign bit), so some bits overflow"
> C Note: "The value '1024' is represented by 11 bits, allowing at most 20 bits 
> for bitshift"

I'll implement something like

> Warning: "Left shift of '1024' overflows the capacity of 'int'"
> CXX Note: "Left shift of '1024' (represented by 11 bits) is undefined because 
> 'int' can hold only 32 bits (including the sign bit), so some bits overflow"
> C Note: "Left shift of '1024' (represented by 11 bits) is undefined because 
> 'int' can hold only 31 bits (excluding the sign bit), so some bits overflow"


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156312/new/

https://reviews.llvm.org/D156312

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to