On 11-03-2003, at 12h 42'01", ??yvind A. Holm wrote to linux-utf8 about "Re: CVS log
messages into UTF-8"
> > iconv --from-code=ISO-8859-1 --to-code=UTF-8
> >
> > or
> >
> > recode ISO-8859-1..UTF-8
> >
> Yeh, but that would also convert the source and binary data inside the
> RCS file, not only the log messages. :) The script has to know about
> the @ and @@ sections inside the file so nothing else but the messages
> are converted.
>
find . -name '*.log' -exec iconv --from-code=ISO-8859-1 --to-code=UTF-8 {} \;
In the worse of cases you make a vim script file where you write all
8bit characters from Latin1 and their equivalent in UTF-8. And run
vim -s vim.script RCS_file. The rest of character would not be changed.
Example from my ISO-8859-16 to UTF-8 recoding:
:1,$s/Ă/Ä?/g
:1,$s/ă/Ä?/g
:1,$s/Â/Ă?/g
:1,$s/â/â/g
:1,$s/Î/Ă?/g
:1,$s/î/ĂŽ/g
:1,$s/ş/Č?/g
:1,$s/Ş/Č?/g
:1,$s/ţ/Č?/g
:1,$s/Ţ/Č?/g
:1,$s/iso-8859-2/UTF-8/g
:1,$s/iso-8859-16/UTF-8/g
:x
You could do it also with sed, but I have nu idee how it will work.
With vim it works. But I had to type the UTF-8 characters like their
8bit components, because I could not find any editor to be able to
insert the letters of interest. "t comma below" - Č? is obtained
by pressing Meta-Shift-h Ctrl-Meta-[ in xterm (Meta is not Alt, nor
AltGr, is a key wich add the 8th bit at 7bits characters).
Vim can edit also binary files. Also if the first "n" lines has to be
skipped, use n instead of 1 in the second column of the vim.script file.
Ionel
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/