Hi,

I just stumbled over the same problem and then found that you are already some steps ahead.

The question is what to do with a word not in the dictionary, there are 2 possible reasons.

1. the word is misspelled
2. the dictionary is incomplete

So maybe Lars' aim to improve the Swedish dictionary can be combined with spell checking. At first there has to be a list with words not recognized by the OOo spell checker. This list contains words for one of the above mentioned reasons and has to be split up in separate lists as they are handled differently. Using a 'trusted' dictionary could ease this job to find words which have not been recognized by the OOo dictionary. The others would have to get analyzed by hand I guess.

Where I don't have a clue at the moment is how to get the red lined words out of the *.odt document again?

Is the status saved or do we have to run a script first to add a marker?


By the way if you do not only have a sourcetarball but also a solver (tarball or built yourself) there is a tool to extract an sdf file containing one or more languages from all localize.sdf files. In addition it can also extract source languages which are de and en-US.

its called localize
to extract all French strings simply call

> localize -e -l fr -f all_fr.sdf

(well you need to do a configure first)


Regards,
Gregor



Marcin Miłkowski wrote:

Hi Lars,

Here is another way:

  find OOE680_m6 -name '*sdf' | xargs cat |
    awk '-F\t' '$10=="sv"{print $11}' | sed 's/~//g;s/\\[nt]/ /g'

Apparently, tilde precedes the underlined shortcut letter in a menus (E~xport), and the texts contain \n for newlines.


I was removing tildes etc. from the result file, but I kept them in the source txt file - because you need to find the source faulty segment, and without knowing the tilde and the rest of "formatting garbage", you cannot pin down the right sdf. Of course, we could include the ID and set its style to "no language" in ODF (translating to ODF is easy in this case, and could be done with awk and zip), but I started with something much simpler.

In the future, I think that a simple style tagger should be used. Let me explain: there should be "no language" special style for help tags etc. so that they would not be checked. Most translation tools support such things, for example free TortoiseTagger for Word, OmegaT does it, and MemoQ or Across (all free and/or open source), not to mention enlasotools (dedicated filter set) but probably awk would be enough even for XML tagging in the help file. So two files would be needed: a complete text file (probably with some additional info like IDs), and tagged ODF file for spell and grammar checking.

Yet I haven't yet started working on that as the schedule is unrealistically tight for additional translation QA _before_ release and _after_ integrating the translation. My idea was born out of the fact that Polish translations had broken characters in latest builds just because of some faulty conversion to UFT-8, and that would be detected automatically using spell-check. So this should be a step in testing before the release, and after integrating the localized strings.

See my proposal:
http://wiki.services.openoffice.org/wiki/Automating_Translation_QA


I have no experience from the tools used in translation. Is anything like Alchemy Catalyst available as free software? Could such functionality be built into future releases of OpenOffice? I would think that OpenOffice has many users who are translators, especially since the software is adopted in poorer countries where all kinds of languages are spoken.


Catalyst is free but only in a very restricted version (no way to create new projects). But it's only one of the tools that's available, as I mentioned above. Anyway, these tests are quite trivial to implement using sed, awk and other standard Unix tools which can run happily in Win32 using cygwin.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to