Dmitri,
Thanks for the information about OmegaT's internals.
The perl interfacing to hunspell is really trivial:
1. create speller object to a language
my $speller = Text::Hunspell->new("/.../test.aff", "/.../test.dic");
2. do spell check
$speller->check( $word );
(result: 1 if found, 0 if not:
3. if not found, give suggestions:
@suggestions = $speller->suggest( $misspelled );
4. delete spell object.
$speller->delete($speller);
I think, the above information helps a bit for designing a spelling interface
to OmegatT. Maybe you could forward also this information to the Omega group.
There is a very similar perl interface to aspell also, Text::Aspell. (it was my
sample for the hunspell one). Aspell is mighty in suggestions, but it misses
forbidden words and twofold affixing at the moment.
Regards: Eleonora
> >I think your
> >idea is good, to make spell checking before translation.
>
> Not exactly. You see, what we (users of OmegaT) want is spellcheck DURING
> translation or just upon it. The workflow with OmegaT is as follows (most
> briefly):
> 1) prepare files to translate in supported formats;
> 2) create a project and translate (when you load a project, OmegaT (like,
> actually any CAT tool) splits the text(s) into so called segments —
> minimal units to translate (it may be a line, a sentence, a paragraph —
> depending on file types and settings).
> 3) create target documents.
>
> So, untill you make the step 3, you can't control any typing mistakes in
> the translation. The idea is to somehow engage a spellcheck engine to have
> this ability in OmegaT (possibly with any kind of highlighting spelling
> errors). Obviously, Hunspell would be a perfect option: it's free (LGPL,
> if I'm not mistaken) and it can use MySpell dictionaries which are already
> numerous.
>
> If any embedding into OmegaT (Java) directly is not possible, is it
> possible to make a kinda bypass by checking the project's translation
> memory (I bet, this should be possible with a Perl script!). Some
> background: OmegaT stores translations memories as TMX files. TMX is an
> XML application, so it's a well-structured format. All translated segments
> as described above are stored as pairs of the source text and its
> translation. The source and the target are clearly labeled with
> language/locale tags. Such a pair is called a translation unit (TU).
> Here's an example of such a file:
>
> ========================================================================
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE tmx SYSTEM "tmx11.dtd">
> <tmx version="1.1">
> <header
> creationtool="OmegaT"
> creationtoolversion="1"
> segtype="paragraph"
> o-tmf="OmegaT TMX"
> adminlang="EN-US"
> srclang="EN-US"
> datatype="plaintext"
> >
> </header>
> <body>
> <tu>
> <tuv lang="EN-US">
> <seg>Cancel</seg>
> </tuv>
> <tuv lang="PL-PL">
> <seg>Anuluj</seg>
> </tuv>
> </tu>
> <tu>
> <tuv lang="EN-US">
> <seg>Close</seg>
> </tuv>
> <tuv lang="PL-PL">
> <seg>Zamknij</seg>
> </tuv>
> </tu>
> </body>
> </tmx>
> ==========================================================
>
> So, I envisage a scenario approximately like this:
>
> 1) run a script that reads and parses a TM file (AFAIK, Perl has libraries
> for handling XML);
> 2) the script reads each segment (I guess, SAX would be OK) and checks
> only translations (i.e., the contents of such <tuv></tuv>, where the
> “lang” attribute is DIFFERENT of the “srclang” in the header) and
> somehow
> displays mistakes.
> 3) it would be cool to have also the ability to correct mistakes.
>
> Something like this. Well, I understand, it can be a real job. But maybe?
>
> I'll also send a copy of this letter to the OmegaT group. Maybe, someone
> there can suggest something.
>
> I'm afraid, I did not say this, though I should: THANK YOU :-)
--
"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]