On Tue, May 12, 2009 at 8:56 AM, Dmitry Polevoy <openocr.pole...@gmail.com> wrote:
> The main problem with UTF8 source code is that Cyrillic characters are used > in > the source code and conversion can broke program. We should make strong test > system to check OCR results before we start expiriments with conversion. At > present time compatibility with Visual Studio 6 is important and this is the > second reason to not convert source codes. There are two different character set questions here. One is the character set that Cuneiform uses internally. We don't change that. The dat files are not touched, no functionality is changed. The other encoding is the one used in the source files, specifically in the comments. Merely converting comments is not a problem: You just compute md5sums of the output files before and after the conversion. If they are the same (and they should be), you can be certain that nothing has broken. There may be an issue with character values > 127 that are in the code. Probably they will show up as the unicode "unknown symbol" character but since C compilers ignore encodings and just treat files as a byte stream, it should still work. If this causes problems we can replace the problematic values with proper character escape codes. > Is source codes in win-1251 realy so bad? (I have lack of experience in this > field) The main problem is that it looks very ugly. Here is a random sample: /********** Çàãîëîâîê *******************************************************/ /* Àâòîð, */ /* êîììåíòàðèè */ /* è äàëíåéøàÿ */ /* ïðàâêà : Àëåêñåé Êîíîïëåâ */ /* Ðåäàêöèÿ : 08.06.00 */ /* Ôàéë : 'Normalise.cpp' */ /* Ñîäåðæàíèå : Íîðìàëèçàöèÿ ñûðüÿ */ /* Íàçíà÷åíèå : */ /*----------------------------------------------------------------------------*/ If you could manually set your text editor widget to use win-1251 then this would be a less of an issue. But at least Eclipse can't do that. > Not source files (like read.me, lisence and so on) can be converted to UTF8 > if it is required. The latest checkin does this already as an experiment. You can try copying license.txt to test.c or something, opening it with VS 6 and telling if it works for you. _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : cuneiform@lists.launchpad.net Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp