Sorry for the silence... I think this is a perfectly fine direction.
I would suggest a few things: 1) This should be in 'include/llvm/Support/UTFUtils.h' or some such. 2) I would personally like a bit more type safety involved. Why not output a wchar_t array? I might suggest the following: 2a) Make this a template on a type T which only has specializations for char16_t and char32_t (or LLVM-provided analogs to those types in C++98) 2b) Make a wchar_t variant that delegates to the appropriately sized one. 2c) Make the output buffer strongly typed, potentially SmallVectorImpl<T>& 3) Mail the updated patch to llvm-commits as well to make sure folks are OK with it going into the common LLVM libraries. But I dunno if all of that is worthwhile... The unsigned size selector just seems brittle and gross to me. On Mon, Jul 2, 2012 at 4:51 PM, Nico Weber <[email protected]> wrote: > Final ping. If I don't get a "please land" from a code owner, I'll let > this patch die. (Not a big deal either way.) > > On Fri, Jun 29, 2012 at 9:56 AM, Nico Weber <[email protected]> wrote: > > All done (except for the static_cast, since that line also converts > > signedness). > > > > Any comments on the new file name / location from anyone? > > > > > > On Wed, Jun 27, 2012 at 7:06 AM, Hans Wennborg <[email protected]> > wrote: > >> > >> On Mon, Jun 25, 2012 at 4:54 AM, Nico Weber <[email protected]> > wrote: > >> > Hi, > >> > > >> > the attached patch moves the duplicated code in CGExpr.cpp and > >> > LiteralSupport.cpp into a new file Basic/ConvertUTFWrapper.h/cpp. > >> > Please let me know what you think. > >> > > >> > Nico > >> > >> Looks reasonable as far as I can tell. Just some nits: > >> > >> > +#ifndef LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H > >> > +#define LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H > >> > + > >> > +#include "llvm/ADT/StringRef.h" > >> > + > >> > +namespace clang { > >> > + > >> > +/// Convert an UTF8 StringRef to UTF8, UTF16, or UTF32 depending on > >> > +/// WideCharWidth. On success, ResultPtr will point one after the end > >> > of the > >> > +/// copied string. > >> > +/// \returns true on success. > >> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef > Source, > >> > + char*& ResultPtr); > >> > >> Should the *& be on ResultPtr here? (and in the .cpp file) > >> Should the comment say that it's copying into ResultPtr? > >> > >> > Index: lib/Basic/ConvertUTFWrapper.cpp > >> > =================================================================== > >> > --- lib/Basic/ConvertUTFWrapper.cpp (revision 0) > >> > +++ lib/Basic/ConvertUTFWrapper.cpp (revision 0) > >> > @@ -0,0 +1,55 @@ > >> > +//===-- ConvertUTFWrapper.cpp - Wrap ConvertUTF.h with clang data > types > >> > -----=== > >> > +// > >> > +// The LLVM Compiler Infrastructure > >> > +// > >> > +// This file is distributed under the University of Illinois Open > >> > Source > >> > +// License. See LICENSE.TXT for details. > >> > +// > >> > > >> > > +//===----------------------------------------------------------------------===// > >> > + > >> > +#include "clang/Basic/ConvertUTFWrapper.h" > >> > +#include "clang/Basic/ConvertUTF.h" > >> > +#include "clang/Basic/LLVM.h" > >> > + > >> > +namespace clang { > >> > + > >> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef > Source, > >> > + char*& ResultPtr) { > >> > + assert(WideCharWidth==1 || WideCharWidth==2 || WideCharWidth==4); > >> > + ConversionResult result = conversionOK; > >> > + // Copy the character span over. > >> > + if (WideCharWidth == 1) { > >> > + if (!isLegalUTF8String(reinterpret_cast<const > >> > UTF8*>(&*Source.begin()), > >> > + reinterpret_cast<const > >> > UTF8*>(&*Source.end()))) > >> > >> The &* isn't really necessary for Source.begin() / end(). Do we > >> normally do this for StringRefs? (I see it wasn't there in > >> lib/Lex/LiteralSupport.cpp) > >> > >> > + result = sourceIllegal; > >> > + memcpy(ResultPtr, Source.data(), Source.size()); > >> > + ResultPtr += Source.size(); > >> > + } else if (WideCharWidth == 2) { > >> > + UTF8 const *sourceStart = (UTF8 const *)Source.data(); > >> > >> This is equivalent to "const UTF8 *sourceStart", right? At least I > >> find that easier to read. And maybe static_cast it? > >> > >> > + // FIXME: Make the type of the result buffer correct instead of > >> > + // using reinterpret_cast. > >> > + UTF16 *targetStart = reinterpret_cast<UTF16*>(ResultPtr); > >> > + ConversionFlags flags = strictConversion; > >> > + result = ConvertUTF8toUTF16( > >> > + &sourceStart,sourceStart + Source.size(), > >> > >> Nit: space between &targetStart and targetStart? > >> > >> > + &targetStart,targetStart + 2*Source.size(),flags); > >> > >> And some more spaces here too.. > >> > >> > + if (result==conversionOK) > >> > >> And here :) > >> > >> > + ResultPtr = reinterpret_cast<char*>(targetStart); > >> > + } else if (WideCharWidth == 4) { > >> > + UTF8 const *sourceStart = (UTF8 const *)Source.data(); > >> > >> Ditto about "const UTF8* sourceStart" etc. > >> > >> > + // FIXME: Make the type of the result buffer correct instead of > >> > + // using reinterpret_cast. > >> > + UTF32 *targetStart = reinterpret_cast<UTF32*>(ResultPtr); > >> > + ConversionFlags flags = strictConversion; > >> > + result = ConvertUTF8toUTF32( > >> > >> > + &sourceStart,sourceStart + Source.size(), > >> > + &targetStart,targetStart + 4*Source.size(),flags); > >> > + if (result==conversionOK) > >> > >> Ditto about spaces. > >> > >> Thanks, > >> Hans > > > > > > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
