Re: [l10n-dev] Spell checking the spell checker

Marcin Miłkowski Fri, 09 Feb 2007 00:37:27 -0800

Lars Aronsson napisał(a):

Is there any way to extract all Swedish text used in the Swedishversion of OpenOffice.org, including menues, error messages andhelp texts? Since the current spell checking dictionary is veryprimitive, I want to improve it. And one step would be to make
sure that all the Swedish help texts are recognized by the
spell checker.

It's trivially simple. I use gawk script on sdf files. The procedure issimple enough:


1. Unpack the l10n tarball to a directory.

2. Run #ls -R | gawk -f create.awk > extract.sh

I extract filenames from file1 and build a bash script using a gawkoneliner "create.awk":


/^\..*\:/{gsub(":","",$0); prevline=$1}
/localize\.sdf/{print "gawk -f print_pl.awk " prevline"/"$0}

The script print_pl.awk can be adapted for any language:

BEGIN {FS="\t"}
{if ($10=="pl") print $11}

Change "pl" to a language you want to check, and use

3. # extract.sh > complete_translation.txt

You can import complete_translation.txt. You'd get a lot of false alarmson StarBasic but we can build a user dictionary with StarBasic keywords(that's trivially easy) for spell-checking the whole help. Is there anyother language we should cover?

Is such a test part of the normal procedure?
It seems it could make sense for every language.


See my proposal:

http://wiki.services.openoffice.org/wiki/Automating_Translation_QA

Best,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Spell checking the spell checker

Reply via email to