> Do we actually need to detect UTF-8 here, or can we just always assume the input is in UTF-8 as Clang does?
There are plenty source code files in various 8-bit encodings out there, so we'd better be sure that we deal with valid UTF-8. And the "detection" is trivial - basically just a check, that the whole file is valid utf-8. The only overhead we have is from passing encoding everywhere. But it allows us to avoid checking validity of encoding of each token's text. http://llvm-reviews.chandlerc.com/D918 _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
