Sorry for the delay, have been very backed up on patch review:
On Jul 16, 2011, at 8:21 PM, Scott Conger wrote:
> Attached patch adds support for -finput-charset and automatic text
> conversion when there are multibyte characters or a byte-order-mark is
> present. The net effect is that all internal text should now be in
> UTF-8.
Wow, awesome! I'm thrilled you're tackling this!
> I have the exec charset options mostly working, but I trimmed it down
> to this for now, as it's a decently sized patch as-is.
Smallish incremental steps greatly appreciated.
> Issues:
>
> * It turned out to be quite ugly to get iconv working on Windows. See
> comment in NativeIconv.cpp. If what's there is objectionable, I'd
> prefer to rip out Windows support of iconv for now.
I have a higher level concern: the functionality of "an iconv portability
wrapper" should be sunk out of clang into the llvm lib/Support library. That
is where we isolate system specific stuff (see lib/Support/Unix/* for example)
and ifdefs.
I would split this part of the patch out and get it in (on llvm-commits) after
your autoconfigury patch goes in. Here are some specific thoughts on the
functionality at this level:
+#ifdef NEED_FOR_EXEC_CHARSET
+// Encoder for converting from UTF-8 to any iconv supported encoding
+class IconvEncoder : public Encoder {
+public:
There shouldn't be configuration-specific #ifdefs like this in header files,
these should be in .cpp files. Just make both the encoder and the decoder
interfaces unconditionally available.
Also, instead of a base class with multiple derived classes and a virtual
decode method, I'd suggest a simple interface like this (in llvm namespace, not
codec):
// In include/llvm/Support/UnicodeTranscoder.h:
class UnicodeDecoder {
public:
UnicodeDecoder();
~UnicodeDecoder();
/// Initialize the decoder, returning true on failure.
bool init(const char *sourceEncoding);
/// Converts the passed source string from its encoding specified by 'init'
to
/// UTF-8 and places it in dest. If there is a decoding error, this returns
true.
///
bool decodeToUtf8(std::vector<char> &dest, StringRef Source) = 0;
private:
UnicodeDecoder(const UnicodeDecoder&);
const UnicodeDecoder& operator=(const UnicodeDecoder&);
void *Impl; // Backing implementation.
};
Then put all of the configury based logic and any basic implementations that we
want to provide (for hosts without iconv) into the UnicodeTranscoder.cpp file.
Also, as a follow-on patch after this basic support is in, the BOM detection
and UnicodeUtil.h stuff can be added as static functions to the
UnicodeDecoder/UnicoderEncoder classes as makes sense.
> * It turned out to be quite ugly to get iconv working on Windows. See
> comment in NativeIconv.cpp. If what's there is objectionable, I'd
> prefer to rip out Windows support of iconv for now.
I'd be completely fine leaving this out and either tackling it as a follow-on
patch or leaving it for someone who cares about windows to tackle it. It would
also be nice to eventually add iconv-less support for Shift-Jis, but again, you
don't have to be the one to do it, just get the interface set up so someone can
drop the code in.
> * Difficult to automatically test as iconv implementations support
> very different sets of encodings.
You can add some simple tests to llvm/unittests, and those tests can be
#ifdef'd on iconv support. It doesn't need to cover every possible encoding,
just something to prove validity of the theme.
Once this and the configury patch go into LLVM, we can take another look at the
Clang patch.
-Chris
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits