Hi,
the design of Apertium has some resemblance with the outdated
word-to-word statistical translations models, especially the simplest:
IBM model 1:
1  The translation is made word by word.
2. The most probable translation of a word is chosen (developers are
advised to have only one translation in the bidix - the most common).
3. The translation is supposed to work best for closely related
languages.

Point 2 makes Apertium quite similar to IBM model 1 without the language
model: then only the most probable word is chosen. Unfortunately, this
often leads to terrible translations.

Thus, adding the language model to ensure "fluent" output should
outperform Apertium. And it does. On closely related languages.

I've written my own IBM model 1 training program and decoder
(translator). I trained on the Block World Corpus and built 3-gram
language models with IRSTML (available at
http://www.tunedal.nu/download/block_world_corpus/). Finally I
translated the evaluation files (available at the above site) from da to
sv (and the other way around) and from sv to en (and the other way
around).

Results:
1. The translation between Swedish and English is mostly terrible (to a
large extent due to that IBM 1 doesn't use any fertility i.e. one word
only produces one translated word).
2. The translation between Swedish and Danish is in most cases
acceptable. Only a few sentences are terrible. On the whole it looks
much better than the translations from Apertium - in spite of my efforts
since last year.

Translation from Swedish to Danish:

Size: 5] [Hypothesis: <s> tag en pil </s> ] [Probability:
0.13901360785639832]
[Size: 5] [Hypothesis: <s> tag et blok </s> ] [Probability:
0.3999344349708584]
[Size: 6] [Hypothesis: <s> tag en blå kegle </s> ] [Probability:
0.10870042929030037]
[Size: 5] [Hypothesis: <s> hämta en pil </s> ] [Probability:
0.13901360785639832]
[Size: 5] [Hypothesis: <s> tag inte bloket </s> ] [Probability:
0.32824490974806453]
[Size: 8] [Hypothesis: <s> stil en kegle på mit blok </s> ]
[Probability: 0.19715184174208997]
[Size: 7] [Hypothesis: <s> stil en kegle på bloket </s> ] [Probability:
0.32217102939336967]
[Size: 5] [Hypothesis: <s> hun tager bloket </s> ] [Probability:
0.3603255017036356]
[Size: 5] [Hypothesis: <s> hun tager pyramiden </s> ] [Probability:
4.2222116526677295E-4]
[Size: 10] [Hypothesis: <s> stil en røde pil cirklen blå blå cirkel </s>
] [Probability: 1.0540094637762348E-80]
[Size: 10] [Hypothesis: <s> stil en lila pil på en blå cirkel </s> ]
[Probability: 0.8681190300773386]
[Size: 7] [Hypothesis: <s> jeg tager et blå blok </s> ] [Probability:
0.546319934684801]
[Size: 6] [Hypothesis: <s> stil pilen på cirklen </s> ] [Probability:
0.658918711829524]
[Size: 11] [Hypothesis: <s> han stiller det røde bloket på den blå
cirklen </s> ] [Probability: 0.758360506603279]
[Size: 9] [Hypothesis: <s> han stiller en pil på sin cirkel </s> ]
[Probability: 0.7823839437386677]
[Size: 9] [Hypothesis: <s> han stiller en pil på hans cirkel </s> ]
[Probability: 0.6127633005242946]
[Size: 10] [Hypothesis: <s> hun stiller sit blok på min blå cirkel </s>
] [Probability: 0.9448514869725285]
[Size: 11] [Hypothesis: <s> jeg stiller keglen på cirklen på hendes blå
cirkel </s> ] [Probability: 0.93586787430269]

Translation from Danish to Swedish:

Size: 5] [Hypothesis: <s> ta röd pil </s> ] [Probability:
0.09639440063653734]
[Size: 5] [Hypothesis: <s> ta ett block </s> ] [Probability:
0.2845027267436607]
[Size: 6] [Hypothesis: <s> ta en blåa kon </s> ] [Probability:
0.13978428076217828]
[Size: 5] [Hypothesis: <s> hent röd pil </s> ] [Probability:
0.09639440063653734]
[Size: 5] [Hypothesis: <s> ta ikke blocket </s> ] [Probability:
0.3290561408341361]
[Size: 8] [Hypothesis: <s> ställ en kon på cirkeln block </s> ]
[Probability: 6.718122684379477E-4]
[Size: 7] [Hypothesis: <s> ställ en kon på blocket </s> ] [Probability:
0.18048539535856953]
[Size: 5] [Hypothesis: <s> hon tar blocket </s> ] [Probability:
0.3018144923420706]
[Size: 5] [Hypothesis: <s> hon tar pyramiden </s> ] [Probability:
3.015185872799287E-4]
[Size: 10] [Hypothesis: <s> ställ en röd pil cirkeln blå blocket blå
</s> ] [Probability: 5.637806526505979E-66]
[Size: 10] [Hypothesis: <s> ställ en lila pil på blå blocket blå </s> ]
[Probability: 1.609395795344925E-63]
[Size: 7] [Hypothesis: <s> jag tar ett blått block </s> ] [Probability:
0.3288769174154776]
[Size: 6] [Hypothesis: <s> ställ pilen på cirkeln </s> ] [Probability:
0.659488379620821]
[Size: 11] [Hypothesis: <s> han ställer det röda blocket på den blåa
cirkeln </s> ] [Probability: 0.7671451309268721]
[Size: 9] [Hypothesis: <s> han ställer en blå cirkeln blåa blå </s> ]
[Probability: 3.0647414042840357E-71]
[Size: 9] [Hypothesis: <s> han ställer en blå cirkeln blocket blå </s> ]
[Probability: 3.2520636803230927E-22]
[Size: 10] [Hypothesis: <s> hon ställer sitt block på min blåa cirkel
</s> ] [Probability: 0.9261884199293765]
[Size: 11] [Hypothesis: <s> jag ställer konen på cirkeln på hennes blåa
cirkel </s> ] [Probability: 0.9436714749562712]

Yours,
Per Tunedal


------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to