Geir,
I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).
Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.
Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<[email protected]>.
------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr
------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr
------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.
------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <[email protected]>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<[email protected]>.
------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E sand𠼳[email protected]>.
Added missing entries to THANKS.
------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).
------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍
------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍
------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).
------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).
------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).
------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.
------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.
------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.
------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right
------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)
------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not. This requires a minor hack, but
it's better than requiring a massive function that we only use once
------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.
------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.
------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply
------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply
------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard
------------------------------------------------------------------------
The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.
#! /usr/bin/env perl
use strict;
use warnings;
use Encode qw( from_to );
use SVN::Ra;
sub is_ascii {
my @chars = split(//, shift);
for my $c (@chars) {
if (ord($c) >= 128) {
return 0;
}
}
1;
}
my $ra = SVN::Ra->new("svn://svn.sv.gnu.org/nano");
$ra->get_log('', 1, $ra->get_latest_revnum, 0, 1, 0, sub {
my ($paths, $rev_num, $user, $datetime, $log_msg) = @_;
if (not is_ascii($log_msg)) {
print
"------------------------------------------------------------------------\n";
print "r", $rev_num, "\n";
print $log_msg, "\n";
}
});
print
"------------------------------------------------------------------------\n";