As of this commit, edbrowse recognizes utf16 or utf32, according to the byte order mark, and converts to utf8, the internal edbrowse format, and the only format understood by pcre. Text is converted back if the same file is written. If text is sent anywhere else it remains in utf8. This is consistent with our iso utf8 conversions. Big and little endian are recognized.
I ran a few tests but it is not thoroughly tested, there are lots of corner cases. This has been muched discusssed, and didn't seem worth doing, but Geoff pointed out that such files are more common on Windows, in fact I think he first discovered the problem, and much of the Asian world uses utf16 in files and websites because it is the most efficient way to represent such text, more efficient than utf8. So this web page, coming down as utf16, now works. https://portal.slm.tu-dresden.de Geoff if you have some 16 or 32 files, you may wish to test, edbrowse whatever-file-utf32.txt and see if it looks right, and beyond this, make some edits and write the file and see if the edits stick and if the file remains in its original format. Ok, I already found a windows bug just by thinking about it. Text files are open text mode but when mapping back to utf 16 or 32 they need to be binary mode. I may even have to stick in \r\0\0\0 manually. Arrgghh. I'll look into it. Karl Dahlke _______________________________________________ Edbrowse-dev mailing list [email protected] http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
