Hi, the design of Apertium has some resemblance with the outdated word-to-word statistical translations models, especially the simplest: IBM model 1: 1 The translation is made word by word. 2. The most probable translation of a word is chosen (developers are advised to have only one translation in the bidix - the most common). 3. The translation is supposed to work best for closely related languages.
Point 2 makes Apertium quite similar to IBM model 1 without the language model: then only the most probable word is chosen. Unfortunately, this often leads to terrible translations. Thus, adding the language model to ensure "fluent" output should outperform Apertium. And it does. On closely related languages. I've written my own IBM model 1 training program and decoder (translator). I trained on the Block World Corpus and built 3-gram language models with IRSTML (available at http://www.tunedal.nu/download/block_world_corpus/). Finally I translated the evaluation files (available at the above site) from da to sv (and the other way around) and from sv to en (and the other way around). Results: 1. The translation between Swedish and English is mostly terrible (to a large extent due to that IBM 1 doesn't use any fertility i.e. one word only produces one translated word). 2. The translation between Swedish and Danish is in most cases acceptable. Only a few sentences are terrible. On the whole it looks much better than the translations from Apertium - in spite of my efforts since last year. Translation from Swedish to Danish: Size: 5] [Hypothesis: <s> tag en pil </s> ] [Probability: 0.13901360785639832] [Size: 5] [Hypothesis: <s> tag et blok </s> ] [Probability: 0.3999344349708584] [Size: 6] [Hypothesis: <s> tag en blå kegle </s> ] [Probability: 0.10870042929030037] [Size: 5] [Hypothesis: <s> hämta en pil </s> ] [Probability: 0.13901360785639832] [Size: 5] [Hypothesis: <s> tag inte bloket </s> ] [Probability: 0.32824490974806453] [Size: 8] [Hypothesis: <s> stil en kegle på mit blok </s> ] [Probability: 0.19715184174208997] [Size: 7] [Hypothesis: <s> stil en kegle på bloket </s> ] [Probability: 0.32217102939336967] [Size: 5] [Hypothesis: <s> hun tager bloket </s> ] [Probability: 0.3603255017036356] [Size: 5] [Hypothesis: <s> hun tager pyramiden </s> ] [Probability: 4.2222116526677295E-4] [Size: 10] [Hypothesis: <s> stil en røde pil cirklen blå blå cirkel </s> ] [Probability: 1.0540094637762348E-80] [Size: 10] [Hypothesis: <s> stil en lila pil på en blå cirkel </s> ] [Probability: 0.8681190300773386] [Size: 7] [Hypothesis: <s> jeg tager et blå blok </s> ] [Probability: 0.546319934684801] [Size: 6] [Hypothesis: <s> stil pilen på cirklen </s> ] [Probability: 0.658918711829524] [Size: 11] [Hypothesis: <s> han stiller det røde bloket på den blå cirklen </s> ] [Probability: 0.758360506603279] [Size: 9] [Hypothesis: <s> han stiller en pil på sin cirkel </s> ] [Probability: 0.7823839437386677] [Size: 9] [Hypothesis: <s> han stiller en pil på hans cirkel </s> ] [Probability: 0.6127633005242946] [Size: 10] [Hypothesis: <s> hun stiller sit blok på min blå cirkel </s> ] [Probability: 0.9448514869725285] [Size: 11] [Hypothesis: <s> jeg stiller keglen på cirklen på hendes blå cirkel </s> ] [Probability: 0.93586787430269] Translation from Danish to Swedish: Size: 5] [Hypothesis: <s> ta röd pil </s> ] [Probability: 0.09639440063653734] [Size: 5] [Hypothesis: <s> ta ett block </s> ] [Probability: 0.2845027267436607] [Size: 6] [Hypothesis: <s> ta en blåa kon </s> ] [Probability: 0.13978428076217828] [Size: 5] [Hypothesis: <s> hent röd pil </s> ] [Probability: 0.09639440063653734] [Size: 5] [Hypothesis: <s> ta ikke blocket </s> ] [Probability: 0.3290561408341361] [Size: 8] [Hypothesis: <s> ställ en kon på cirkeln block </s> ] [Probability: 6.718122684379477E-4] [Size: 7] [Hypothesis: <s> ställ en kon på blocket </s> ] [Probability: 0.18048539535856953] [Size: 5] [Hypothesis: <s> hon tar blocket </s> ] [Probability: 0.3018144923420706] [Size: 5] [Hypothesis: <s> hon tar pyramiden </s> ] [Probability: 3.015185872799287E-4] [Size: 10] [Hypothesis: <s> ställ en röd pil cirkeln blå blocket blå </s> ] [Probability: 5.637806526505979E-66] [Size: 10] [Hypothesis: <s> ställ en lila pil på blå blocket blå </s> ] [Probability: 1.609395795344925E-63] [Size: 7] [Hypothesis: <s> jag tar ett blått block </s> ] [Probability: 0.3288769174154776] [Size: 6] [Hypothesis: <s> ställ pilen på cirkeln </s> ] [Probability: 0.659488379620821] [Size: 11] [Hypothesis: <s> han ställer det röda blocket på den blåa cirkeln </s> ] [Probability: 0.7671451309268721] [Size: 9] [Hypothesis: <s> han ställer en blå cirkeln blåa blå </s> ] [Probability: 3.0647414042840357E-71] [Size: 9] [Hypothesis: <s> han ställer en blå cirkeln blocket blå </s> ] [Probability: 3.2520636803230927E-22] [Size: 10] [Hypothesis: <s> hon ställer sitt block på min blåa cirkel </s> ] [Probability: 0.9261884199293765] [Size: 11] [Hypothesis: <s> jag ställer konen på cirkeln på hennes blåa cirkel </s> ] [Probability: 0.9436714749562712] Yours, Per Tunedal ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
