> On Feb 11, 2015, at 11:23 AM, Zachary Turner <ztur...@google.com> wrote: > > > > On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <gclay...@apple.com> wrote: > > > On Feb 11, 2015, at 10:18 AM, Zachary Turner <ztur...@google.com> wrote: > > > > A patch is up which fixes a number of issues with strings not being null > > terminated in LLDB. The issues and error-proneness of using raw c strings > > is well documented and understood, so I would like to propose banning them > > in LLDB for new code [1]. > > > > I realize that's a tall order, and I don't expect all the occurrences of > > them to go away overnight, but issues related to strings have been popping > > up over and over again, and it would be nice if we could deal with more > > interesting problems. > > > > LLVM has a very robust string manipulation library, and while it's not > > perfect, it would have prevented almost all of these issues from ever > > happening in the first place. Here is a quick summary of common C string > > types / operations and their corresponding LLVM equivalents: > > > > Types: > > char stack_buffer[n]; // Use llvm::SmallString<n>. > > That would be fine. > > > const char* foo; // Use llvm::StringRef foo. > > This may be fine as long as there is absolutely _no_ string manipulation on > "foo" (like "strcat(...) etc. Use std::string for anything that requires > storage. And we don't want to start using any code that does "let me assign > an llvm::StringRef() with a ConstString("hello").GetCString() so I don't have > to worry about ownership. Adding string to the constant string pool should > be reserved for things that actually need it (any const names like > namespaces, variable names, type names, paths, etc) and also for when we > return "const char *" through the public API. > > This also means that we can't trust that "foo" is NULL terminated > (llvm::StringRef objects have a data pointer and a length and aren't required > to be terminated) and it means we must call "foo.str().c_str()" any time we > require NULL termination. So I actually don't like llvm::StringRef as a > replacement "const char *"...
> We can't assume that const char*s are null terminated either. Yes we can, that is was is assumed by const char * right now for every call we use it with. > They're just pointers to raw buffers, which you may or may not have null > terminated. So we have the same issue. No we don't, it isn't part of the the way you normally use c-strings. llvm::StringRef doesn't need to NULL terminate as it is part of its design. > By replacing all uses of const char* with StringRef(), we end up with exactly > the same semantics. null terminated strings are null terminated, non null > terminated strings aren't. But that isn't what is currently meant by _any_ of the functions that take a "const char *" with no length parameter, so it means that any switch to using llvm::StringRef requires we use ".str().c_str()" which I would like to avoid. > > > > > Passing arguments; > > void foo(char *, int len); // Use void foo(llvm::SmallVectorImpl<char> > > &buf); > > I would prefer to use std::string here. Clearer and everyone knows how to use > it. SmallVectors are nice, but they have the overhead of storing more data on > the stack. Clang currently can create quite large stack frames when you have > some code that has large switch statements where each switch statement has > local variables. If those switch statements how SmallVector<char, PATH_MAX>, > then you now have a case where if this code is called recursively we end up > blowing the stack. We have seen this a lot in the DWARF parser, so I don't > want to encourage using SmallVector for strings, since many places we use > strings are for paths and other larger buffers. Any current std::string > implementation can store up to 24 characters in the std::string without > needing to do any allocations on the heap, so that essentially gets us a > SmallVector<char, 24>. So I vote std::string. > passing SmallVectorImpl<> is intended as a replacement for the case where you > have this: > > void foo(char *buf); > void bar() { > char buffer[n]; > foo(buffer); > } > > The equivalent LLVM code is this: > > void foo(llvm::SmallVectorImpl<char> &buf); > void bar() { > llvm::SmallString<n> buffer; > foo(buffer); > } > > Exactly the same amount of stack space. It's not intended to be a general > replacement for std::string, which stores its data entirely on the heap. std::string doesn't always store content on the heap. It stores the first ~20 bytes in the std::string itself (at least for any STL implementation that is worth anything). So I really don't see the need to start using a new string class when std::string is just about all we need and it will take up less space on the stack for reasons I mentioned above with switch statements (3 pointers at most). > I do think that many places in LLDB that currently use stack buffers could be > re-written just as well using heap buffers, and for that I agree std::string > is fine. Agreed. > > > > void foo(const char * foo); // Use void foo(llvm::StringRef foo); > > > Again, I don't like this because you must call "foo.str().c_str()" in order > to guarantee NULL termination. > As mentioned previously, in the const char * version you have to guarantee > that it was initialized with a null terminated string in the first place. > The burden is the same either way. Either we're already making the > assumption in the callee, in which case we can continue to make it with the > StringRef(), or we aren't making the assumption in the callee, in which case > we can continue not making the assumption but have the added benefit of > having the length field passed to us. I disagree. Passing "const char*" assumes NULL termination unless a length field is also supplied. Passing llvm::StringRef as an object that is known to not NULL terminate, then we have can't assume termination. So I don't agree that passing llvm::StringRef makes anything better. > > > > > > Operations: > > strcpy, strncpy, etc // Use member functions of SmallString / > > StringRef > > sprintf, snprintf, etc // Use llvm::raw_ostream > > We have StringStream, people should be using that. raw_ostream doesn't > support all printf style formats. So all sprintf and snprintf function calls > should switch to using lldb_private::StreamString. > I actually disagree here. Printf() has plenty of problems, such as > non-portable and confusing arguments, I agree on this one, but we are getting around this with using PRI macros. > lack of type safety, most of this is taken care of when you have the printf format warnings on your class. > possibility of buffer overruns. There are no buffer overruns in the Stream::Printf(). Check the implementation if you need to. > I would actually prefer to use raw_ostream instead of StringStream. I don't. I have written tools using streaming style stuff before (like std::cout) and also written using printf style stuff and I find the printf style stuff more readable and maintainable. I also don't like streams that maintain state like the std::ostream does. Not sure if raw_ostream does any state management, but when they do it causes bugs later for code that doesn't always explicitly set the format, width, precision and much more: std::ostream &os; os << std::hex << hex_num; dump(os); os << hex_num2; little did we know the stream was modified by "dump()" to set the format to decimal so now you must always use: os << std::hex << hex_num2; (repeat for number width, precision and many other factors). _______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev