[PATCH] D136811: WIP: RFC: NFC: C++ Buffer Hardening user documentation.

Jan Korous via Phabricator via cfe-commits Wed, 02 Nov 2022 14:37:26 -0700

jkorous added a comment.

Thank you for the feedback Aaron! We really appreciate the effort you put into 
this!




================
Comment at: clang/docs/SafeBuffers.rst:69-70
+containers such as ``std::span``, you can achieve bounds safety by
+*simply writing good modern C++ code*. This is what this solution is designed
+to help your codebase with.
+
----------------
aaron.ballman wrote:
> This makes it sound like the proposal doesn't cover C because there is no 
> version of `std::span` there.
I read this as a question as: "Could an imaginary codebase benefit from the 
warnings if there were no fixits?".

I can imagine projects where it would be very useful to have 90% of the code 
guaranteed to be free of "potentially dangerous pointer arithmetic" and have 
such operations contained to the remaining 10% of the code. I can imagine a 
principled solution to bounds-safety in C but even just splitting the code into 
safe and 9x smaller "unsafe" part could be extremely useful when it comes to 
debugging, audits, automated and manual testing and similar. Having clang 
guarantee this property reduces the scope of code that needs to be processed 
and a deliberate use of this model by projects can unlock new possibilities.

That is why I feel that the proposal can benefit also C projects.


================
Comment at: clang/docs/SafeBuffers.rst:115
+   (TODO: Will automatic fixits be able to suggest custom containers or views?)
+   (TODO: Explain how to implement such checks in a custom container?)
+
----------------
aaron.ballman wrote:
> I think this would be a good idea, especially if you can use an example of a 
> more involved (but still common) container. Like, how does someone harden a 
> ring buffer?
I believe that we should stick to hardening low-level primitives - all that a 
ring buffer implementation has to do is to store data in `std::array` from 
hardened libc++ and it's all set.


================
Comment at: clang/docs/SafeBuffers.rst:128
+  ``+``, ``-``, ``+=``, ``-=``,
+      - unless that number is a compile time constant ``0``;
+      - subtraction between two pointers is also fine;
----------------
aaron.ballman wrote:
> One case I think this is going to trip up is: `this + 1` (which is not 
> uncommon in practice: 
> https://sourcegraph.com/search?q=context:global+-file:.*test.*+file:.*%5C.cpp+this+%2B+1&patternType=standard).
>  How do users fix up that pointer arithmetic and do they ever get any safety 
> or security benefits from it? (Note, this doesn't apply to `+=`, just `+`.)
That's a fair point and if we don't find a good solution we should ignore 
arithmetic with `this` and a constant.


================
Comment at: clang/docs/SafeBuffers.rst:133
+      - unless the pointer is a compile time constant ``0`` or ``nullptr``;
+      - a number of C/C++ standard library buffer manipulation functions
+        (such as std::memcpy() or std::next()) are hardcoded to act
----------------
aaron.ballman wrote:
> Eventually we should extend this to have an exhaustive list.
Absolutely! Is it ok if we populate the list as we land the implementation?


================
Comment at: clang/docs/SafeBuffers.rst:173
+  #pragma unsafe_buffer_usage begin
+
+Such pragmas not only enable incremental adoption with much smaller 
granularity,
----------------
aaron.ballman wrote:
> Have you considered allowing a short-hand form of the pragmas that are scoped 
> to the current block scope (or file scope if applied at global scope)? e.g.,
> ```
> void func() {
>   {
>     #pragma only_safe_buffers
>     int stuff[10] = {};
>     int x = stuff[1]; // Warning: you're a bad person who should feel bad
>   }
>   int array[10] = {};
>   int y = array[1]; // this_is_fine.png
> }
> ```
We discussed other options but generally prefer simple pragmas for their 
flexibility.
Our current thinking is that for readability purposes we won't allow nesting of 
pragmas and would have end of scope to be explicit.
While this might be more verbose it would be dead simple to reason about.


================
Comment at: clang/docs/SafeBuffers.rst:181-188
+Even though similar functionality can be achieved with generic
+``#pragma clang diagnostic``, a custom pragma is preferable because
+the generic solution’s push/pop mechanism may be confusing. In particular,
+it’s not immediately obvious from the code which mode gets enabled after
+a “pop”, which makes it harder to reason about safety both for you and for
+the purposes of security audit. It is also generally valuable for such
+annotation to describe the property of the annotated code,
----------------
aaron.ballman wrote:
> > a custom pragma is preferable because the generic solution’s push/pop 
> > mechanism may be confusing.
> 
> Er, I'm not certain how much I agree with that. push/pop semantics on this 
> have been around for a *long* time. I do agree that the user can write code 
> such that another user may struggle to understand what the state of the 
> system is after a pop operation, but that shouldn't be a deciding factor for 
> whether we add a new pragma to do the same thing as another pragma. I think 
> I'd rewrite this as:
> ```
> Similar functionality can be achieved with ``#pragma clang diagnostic``, but 
> such use is discouraged because it
> reduces code readability due to being an imperative action rather than 
> describing a property of the annotated
> code. That reduced readability also adds complexity to security code audits. 
> Use of one of the dedicated
> pragmas is strongly recommended.
> ```
> or something along these lines. WDYT?
We'd love to arrive at a design that is very difficult to use incorrectly.

You are absolutely right that our preference for not using the diagnostic 
pragmas stems from the "untyped" pop operation.

Assuming this code is a starting point:
```
#pragma clang diagnostic push warning "-Wfoo"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code
```

if the user adds pragmas like this:
```
#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage
```

they actually get this behavior:
```
#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage
// more code
#pragma clang diagnostic pop //end of -Wfoo
```

We should have the patch for pragmas ready to be put for review here in not too 
distant future. Maybe it'd be easier to discuss there?


================
Comment at: clang/docs/SafeBuffers.rst:213
+
+The attribute is NOT warranted when the function has runtime protection against
+overflows, assuming hardened libc++, assuming that containers constructed
----------------
aaron.ballman wrote:
> One thing I think is worth pointing out about these docs is that the first 
> example effectively says "do add the attribute because the size passed in to 
> the function could be wrong" and the second example says "don't add the 
> attribute on the assumption that the container has the correct size 
> information". The advice feels a bit conflicting to me because in one case 
> we're assuming callers pass in incorrect values and in the other case we're 
> assuming callers pass in correct values and the only distinction between them 
> is a "container was used".  But a significant portion of our users don't use 
> libc++ (they use libstdc++ or MS STL for example).
> 
> I think we should have more details on why the STL used at runtime doesn't 
> matter, or if it still really does matter, we may want to reconsider the 
> advice we give.
> 
> Also, we don't give similar advice for use of the pragmas. Should we? (Maybe 
> split the advice as to when to use markings in general out into its own 
> section and reference it from both the pragma and attribute sections?)
> One thing I think is worth pointing out about these docs is that the first 
> example effectively says "do add the attribute because the size passed in to 
> the function could be wrong" and the second example says "don't add the 
> attribute on the assumption that the container has the correct size 
> information". The advice feels a bit conflicting to me because in one case 
> we're assuming callers pass in incorrect values and in the other case we're 
> assuming callers pass in correct values and the only distinction between them 
> is a "container was used". But a significant portion of our users don't use 
> libc++ (they use libstdc++ or MS STL for example).

Our model depends on Standard Library used implementing the bounds checks.
With that the "container was used" distinction is absolutely crucial - that is 
what adds the bounds checks and already mitigates certain classes of bugs (even 
if the `span` passed in still has the same wrong size).

Let's assume we're starting with this buggy code:

```
void caller() {
  int buffer[10];
  callee(buffer, 20);
}

void callee(int* ptr, unsigned size) {
  ptr[size-1] = 42;
  ptr[size] = 42;
  ptr[100] = 42;
}
```
and transform it to (this is just a part of what we actually plan but hopefully 
illustrates the point):
```
void caller() {
  int buffer[10];
  callee(std::span<int>(buffer, 20), 20);
}

void callee(std::span<int> ptr, unsigned size) {
  ptr[size-1] = 42; // still unmitigated
  ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in 
hardened libc++
  ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened 
libc++
}
```


================
Comment at: clang/docs/SafeBuffers.rst:260-263
+* The “replacement” parameter of ``[[deprecated]]``, which allows for automatic
+  fixits when the function has a drop-in replacement, becomes significantly 
more
+  powerful and flexible in ``[[unsafe_buffer_usage]]`` where it will allow
+  non-trivial automatic fixes.
----------------
aaron.ballman wrote:
> The deprecated attribute does not have a replacement argument, it has an 
> argument for some kind of message to be printed when the deprecated interface 
> is used. So I'm not certain what this is telling me either.
For better or worse the deprecated attribute is "overloaded" and this refers to 
the two-parameter version.

https://clang.llvm.org/docs/AttributeReference.html#deprecated

"When spelled as __attribute__((deprecated)), the deprecated attribute can have 
two optional string arguments. The first one is the message to display when 
emitting the warning; the second one enables the compiler to provide a Fix-It 
to replace the deprecated name with a new name."

Since it seems confusing even to you - maybe we should take this as a signal 
that our users would get just as confused and remove the reference to 
deprecated?
WDYT?


================
Comment at: clang/docs/SafeBuffers.rst:365-370
+The fixits emitted by the warning are correct modulo placeholders. Placeholders
+are the only reason why fixed code is allowed to fail to compile.
+Incorrectly resolving the placeholder is the only reason why fixed code
+will demonstrate incorrect runtime behavior compared to the original code.
+In an otherwise well-formed program it is always possible (and usually easy)
+to resolve the placeholder correctly.
----------------
aaron.ballman wrote:
> Because fixits can be applied automatically to an entire source file, we 
> don't typically allow fix-its that can break code (or fixits where there are 
> multiple choices for the user to pick from) unless they're attached to a 
> *note* instead of a warning (or error).
> 
> Are you planning to follow that same pattern when placeholders are involved?
> 
> (For example, one potential use for automatic fix-its is to apply them 
> automatically to source on commit before running precommit CI; similar to how 
> folks sometimes run clang-format to automate "the source is different 
> before/after the clang-format, so you should look into that".)
That's an extremely interesting question! Let me add our current thinking as a 
context for the discussion.

Our analysis will have its limits and will not be able to provide 100% complete 
and guaranteed correct code transformation in every case.

At the same time it sounds suboptimal both from our and our users perspective 
to not emit anything for such situations.
Imagine we have a code transformation that affects 20 lines of code and all we 
need is for the user to address a single function call argument. 
The thought of bailing, producing just a warning and having user to manually 
change all 20 lines of code sounds very unsatisfactory to me.
That's why we are thinking to use placeholders.

The next question is - can we not break the code when using placeholders?
By definition whenever we'd emit a placeholder our analysis can't provide a 
correct solution. That means that whatever way we'd make the code to compile 
can introduce a logic bug in the program. That is simply unacceptable given the 
overall goal of our effort.
That pretty much leaves us with only using placeholders that would break the 
build.

WDYT?

I wasn't aware of the precedent related to fixits attached to notes not having 
to lead to compilable code and we should consider that path.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136811/new/

https://reviews.llvm.org/D136811

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136811: WIP: RFC: NFC: C++ Buffer Hardening user documentation.

Reply via email to