Re: [cfe-commits] [patch] Fix fixme about duplicated utf conversion code in codegen and lex

Chandler Carruth Mon, 02 Jul 2012 17:00:58 -0700

Sorry for the silence...

I think this is a perfectly fine direction.


I would suggest a few things:

1) This should be in 'include/llvm/Support/UTFUtils.h' or some such.

2) I would personally like a bit more type safety involved. Why not output
a wchar_t array? I might suggest the following:

2a) Make this a template on a type T which only has specializations for
char16_t and char32_t (or LLVM-provided analogs to those types in C++98)
2b) Make a wchar_t variant that delegates to the appropriately sized one.
2c) Make the output buffer strongly typed, potentially SmallVectorImpl<T>&

3) Mail the updated patch to llvm-commits as well to make sure folks are OK
with it going into the common LLVM libraries.

But I dunno if all of that is worthwhile... The unsigned size selector just
seems brittle and gross to me.


On Mon, Jul 2, 2012 at 4:51 PM, Nico Weber <[email protected]> wrote:

> Final ping. If I don't get a "please land" from a code owner, I'll let
> this patch die. (Not a big deal either way.)
>
> On Fri, Jun 29, 2012 at 9:56 AM, Nico Weber <[email protected]> wrote:
> > All done (except for the static_cast, since that line also converts
> > signedness).
> >
> > Any comments on the new file name / location from anyone?
> >
> >
> > On Wed, Jun 27, 2012 at 7:06 AM, Hans Wennborg <[email protected]>
> wrote:
> >>
> >> On Mon, Jun 25, 2012 at 4:54 AM, Nico Weber <[email protected]>
> wrote:
> >> > Hi,
> >> >
> >> > the attached patch moves the duplicated code in CGExpr.cpp and
> >> > LiteralSupport.cpp into a new file Basic/ConvertUTFWrapper.h/cpp.
> >> > Please let me know what you think.
> >> >
> >> > Nico
> >>
> >> Looks reasonable as far as I can tell. Just some nits:
> >>
> >> > +#ifndef LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H
> >> > +#define LLVM_CLANG_BASIC_CONVERT_UTF_WRAPPER_H
> >> > +
> >> > +#include "llvm/ADT/StringRef.h"
> >> > +
> >> > +namespace clang {
> >> > +
> >> > +/// Convert an UTF8 StringRef to UTF8, UTF16, or UTF32 depending on
> >> > +/// WideCharWidth. On success, ResultPtr will point one after the end
> >> > of the
> >> > +/// copied string.
> >> > +/// \returns true on success.
> >> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef
> Source,
> >> > +                       char*& ResultPtr);
> >>
> >> Should the *& be on ResultPtr here? (and in the .cpp file)
> >> Should the comment say that it's copying into ResultPtr?
> >>
> >> > Index: lib/Basic/ConvertUTFWrapper.cpp
> >> > ===================================================================
> >> > --- lib/Basic/ConvertUTFWrapper.cpp   (revision 0)
> >> > +++ lib/Basic/ConvertUTFWrapper.cpp   (revision 0)
> >> > @@ -0,0 +1,55 @@
> >> > +//===-- ConvertUTFWrapper.cpp - Wrap ConvertUTF.h with clang data
> types
> >> > -----===
> >> > +//
> >> > +//                     The LLVM Compiler Infrastructure
> >> > +//
> >> > +// This file is distributed under the University of Illinois Open
> >> > Source
> >> > +// License. See LICENSE.TXT for details.
> >> > +//
> >> >
> >> >
> +//===----------------------------------------------------------------------===//
> >> > +
> >> > +#include "clang/Basic/ConvertUTFWrapper.h"
> >> > +#include "clang/Basic/ConvertUTF.h"
> >> > +#include "clang/Basic/LLVM.h"
> >> > +
> >> > +namespace clang {
> >> > +
> >> > +bool ConvertUTF8toWide(unsigned WideCharWidth, llvm::StringRef
> Source,
> >> > +                       char*& ResultPtr) {
> >> > +  assert(WideCharWidth==1 || WideCharWidth==2 || WideCharWidth==4);
> >> > +  ConversionResult result = conversionOK;
> >> > +  // Copy the character span over.
> >> > +  if (WideCharWidth == 1) {
> >> > +    if (!isLegalUTF8String(reinterpret_cast<const
> >> > UTF8*>(&*Source.begin()),
> >> > +                           reinterpret_cast<const
> >> > UTF8*>(&*Source.end())))
> >>
> >> The &* isn't really necessary for Source.begin() / end(). Do we
> >> normally do this for StringRefs? (I see it wasn't there in
> >> lib/Lex/LiteralSupport.cpp)
> >>
> >> > +      result = sourceIllegal;
> >> > +    memcpy(ResultPtr, Source.data(), Source.size());
> >> > +    ResultPtr += Source.size();
> >> > +  } else if (WideCharWidth == 2) {
> >> > +    UTF8 const *sourceStart = (UTF8 const *)Source.data();
> >>
> >> This is equivalent to "const UTF8 *sourceStart", right? At least I
> >> find that easier to read. And maybe static_cast it?
> >>
> >> > +    // FIXME: Make the type of the result buffer correct instead of
> >> > +    // using reinterpret_cast.
> >> > +    UTF16 *targetStart = reinterpret_cast<UTF16*>(ResultPtr);
> >> > +    ConversionFlags flags = strictConversion;
> >> > +    result = ConvertUTF8toUTF16(
> >> > +      &sourceStart,sourceStart + Source.size(),
> >>
> >> Nit: space between &targetStart and targetStart?
> >>
> >> > +        &targetStart,targetStart + 2*Source.size(),flags);
> >>
> >> And some more spaces here too..
> >>
> >> > +    if (result==conversionOK)
> >>
> >> And here :)
> >>
> >> > +      ResultPtr = reinterpret_cast<char*>(targetStart);
> >> > +  } else if (WideCharWidth == 4) {
> >> > +    UTF8 const *sourceStart = (UTF8 const *)Source.data();
> >>
> >> Ditto about "const UTF8* sourceStart" etc.
> >>
> >> > +    // FIXME: Make the type of the result buffer correct instead of
> >> > +    // using reinterpret_cast.
> >> > +    UTF32 *targetStart = reinterpret_cast<UTF32*>(ResultPtr);
> >> > +    ConversionFlags flags = strictConversion;
> >> > +    result = ConvertUTF8toUTF32(
> >>
> >> > +        &sourceStart,sourceStart + Source.size(),
> >> > +        &targetStart,targetStart + 4*Source.size(),flags);
> >> > +    if (result==conversionOK)
> >>
> >> Ditto about spaces.
> >>
> >> Thanks,
> >> Hans
> >
> >
>
> _______________________________________________
> cfe-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [patch] Fix fixme about duplicated utf conversion code in codegen and lex

Reply via email to