Hi! On Tue, 2009-05-12 at 12:21 +0300, Jussi Pakkanen wrote:
> I don't know any existing solution for this, so probably I'll have to > do my own scripts. If someone knows a tool for this, do let me know. While browsing for something else I stumbled upon an interesting Perl class: http://search.cpan.org/~rkrimen/String-Comments-Extract-0.02/ The good thing about this class is that it uses a tokenizer to extract the comments, not just a bunch of regular expressions, so it correctly handles the most obscure corner cases like comment-like structures embedded in the code. > Actually there is a simpler way: Sounds good to me. I don't know any Perl, but I might try to find some time to write a recoder using this class as a starting point. However I am not sure what would be the correct algorithm to follow... 1) Extract the comments from the file 2) Convert them from CP1251 to UTF-8 using iconv 3) For each comment, replace the old CP1251 string with a new UTF-8 string via a regular expression 4) Write the output to the file I suspect that 3) might be unreliable. Maybe a better way would be to go through the result of the comments extraction line-by-line and thus restrict the replacements to one line only? -- Sincerely yours, Yury V. Zaytsev _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : cuneiform@lists.launchpad.net Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp