On Nov 7, 2008, at 7:56 PM, Zhongxing Xu <[EMAIL PROTECTED]> wrote:
On Sat, Nov 8, 2008 at 1:01 AM, Ted Kremenek <[EMAIL PROTECTED]>
wrote:
On Nov 7, 2008, at 5:34 AM, Zhongxing Xu wrote:
Attached is a scratch implementation of array bound checking. It
seems that implementing array bound checking is pretty
straightforward. No need to make big change to the framework.
<oob.patch>
Wow! Great work. This is very simple. This is a very good start.
I think we can apply it now and then iterate.
One thing that I think is worth mentioning is that not all
MemRegions represent chunks of memory for which we want to enforce
strict software segmentation. Consider the following code:
struct S {
unsigned length;
char buf[];
};
unsigned getLength(char* data) {
struct S* s = (struct S*) (data - offsetof(struct S, buf));
return s->length;
}
I may not have written this correctly, but I have seen code like
this before in some large open source C project (I don't remember
which). I'm not certain if such code poses a challenge for out-of-
region checking, as conceptually we might represent the above code
with several regions layered on top of each other.
For example, the 'data' argument could be a pointer to a MemRegion
representing a character array with an extent of $extent(data) (I'm
using '$' to represent a symbolic value). This extent would be a
subregion of a region representing a 'struct S' object. Through
pointer arithmetic, we get a pointer value that refers to the front
of the 'struct S' object. The subsequent access through 's->length'
first causes us to get the FieldRegion for 'length' of 's' and then
perform the load (which works as expected).
I know this may seem like a contrived example, and it might not be
something we should even care about right now, but I thought it
would bring it up.
Another use of MemRegions that I thought of was bit-level typing
(which I'm not saying we should implement right now, or ever). For
example, suppose you have an unsigned integer variable whose bits
are used as flags. While some programmers may use bit fields for
this task, others just use shifts and masks. MemRegions provide a
nice abstraction to segment out the individual bits of an integer,
allowing us to potentially perform bit-level typing (http://portal.acm.org/citation.cfm?doid=1181775.1181791
) or simply have better symbolic reasoning for bit values. While
one cannot take the address of a bit, one can take the address of
specific words, compute the address of related words using pointer
arithmetic, etc., and everything is fine (no out-of-region access
errors).
My meta point here is that we probably need a simple interface in
StoreManager to determine if accessing beyond the bounds of an
extent is okay. Sometimes it is okay to access out of the current
region as long as it doesn't exceed the bounds of some ancestor
region (which represents the allocated buffer, for example).
Perhaps that means we need to reason about "canonical
locations" (similar to how SourceManager reasons about logical and
physical locations), where canonical locations could represent the
byte location within a chunk of segmented memory. I honestly don't
know.
I don't think these details are high on the priority list of things
to worry about, but I thought I would mention them now before we
hardwire our implementation of array-bounds checking with too many
assumptions.
Understood. We can watch these cases when we make relevant design
decisions in the future.
Agreed.
My feeling is that StoreManager should take the responsibility of
doing correct pointer arithmetic or bit-level reasoning. It is up to
the StoreManager to return the correct region. Therefore other parts
of the analyzer could stay unknown of the details.
Agreed. Hopefully we will be able to stay within such a modular design.
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits