From: Matthieu Monrocq <[email protected]> > Subject: [cfe-commits] Undefined Behavior when invoking StringRef > constructor on a null pointer > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > The llvm::StringRef class has a constructor taking a const char* > parameter. > > This constructor is extremely simple, and I am afraid too simple. It > directly invokes ::strlen on the parameter, without checking whether or > not the pointer is null or not. > > Unfortunately as many C functions, strlen is not required by the standard > to > check its input, and indeed popular implementations assume that the input > is > not null, which results in undefined behavior. > > As far as I see it, there are two ways to deal with this: > - using an assert, to check that the input is non-null. It does not > slow-down the program built with asserts disabled, but does not allow us to > invoke StringRef on null pointers > - using a simple inlined test (ternary operator ?:) to either invoke strlen > or set the length to 0. Makes migrating from const char* to > llvm::StringRef easier. > > I've used the second approach in the patch enclosed (which gmail thoroughly > refused to. I have not measured the performance impact though. > > Please review. > Matthieu. >
Hum, looking further it seems to be only the top of the iceberg. 1. The default constructor makes Data null, so having another constructor doing so does not violate any invariant 2. However I am afraid that not all functions guard agaisnt null properly Even though llvm::StringRef::str has a special case for Data being null, many others (comparisons at least) use memcmp without this validation, and memcmp requires its arguments to be non-null (and we hit that bug on SLES 10 with memcpy recently at work, even with a null length the arguments should be non-null, otherwise undefined behavior is invoked). Normally the C Standard precise that all functions taking pointers should be provided with non-null pointers, unless otherwise precised in their specific description. You can see the reference for memcpy on StackOverflow: http://stackoverflow.com/questions/5243012/is-it-guaranteed-to-be-safe-to-perform-memcpy0-0-0/5243068#5243068 And this mail thread (I should really find a copy of the C final draft...) suggests it also applies to memcmp: http://www.mail-archive.com/[email protected]/msg10149.html It seems to me that there are two roads: - Removing this possibily of being null: simple enough, we just check it at construction and substitute an empty c-string instead - Allowing it, but catching all the bugs associated I am not confident in my ability to catch all uses of Data right now and to remember this in the future when maintaining the class. The first road thus seems saner and I have implemented it in a second iteration of the patch. However it seems that a number of utilities (among which llvm/lib/Support/CommandLine.cpp) differentiate between an empty c-string and a null pointer. I don't mind trying to fix all those who relied on the fact that Data could be null, but there is also a chance that I'll miss some uses. Thoughts ? Matthieu.
llvm_stringref_ub.diff
Description: Binary data
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
