Re: [Apertium-stuff] Old fashoned SMT IBM model 1 outperforms Apertium

Per Tunedal Fri, 30 Aug 2013 02:05:06 -0700

Hi again,
Thank you. I will dig into this.

You didn't answer my question about what's wrong with the English
version of the Block World Corpus? It might be a good idea to improve
the language:


> The English data for the corpus is kind of weird (borderline
> ungrammatical) in some places. 

Feel free to improve the English data. All improvements are welcome!

Yours,
Per Tunedal

On Fri, Aug 30, 2013, at 10:53, Francis Tyers wrote:
> El dv 30 de 08 de 2013 a les 10:39 +0200, en/na Per Tunedal va escriure:
> > Hi,
> > 
> > On Thu, Aug 29, 2013, at 11:20, Francis Tyers wrote:
> > > El dj 29 de 08 de 2013 a les 10:13 +0200, en/na Per Tunedal va escriure:
> > > > Hi,
> > > > the design of Apertium has some resemblance with the outdated
> > > > word-to-word statistical translations models, especially the simplest:
> > > > IBM model 1:
> > > > 1  The translation is made word by word.
> > > > 2. The most probable translation of a word is chosen (developers are
> > > > advised to have only one translation in the bidix - the most common).
> > > > 3. The translation is supposed to work best for closely related
> > > > languages.
> > > > 
> > > > Point 2 makes Apertium quite similar to IBM model 1 without the language
> > > > model: then only the most probable word is chosen. Unfortunately, this
> > > > often leads to terrible translations.
> > > 
> > > Except:
> > > 
> > > * You can use the lexical selection module, which can give equivalent
> > > results to using a target-language model.
> > 
> > Sure. It's on the to do list.
> > 
> > > * In IBM model 1 there is no reordering.
> > 
> > True. But there isn't much need for reordering (if any) when translating
> > between Swedish and Danish. That's why I've chosen to challenge Apertium
> > by the simple IBM model 1. My task is now to beat that simple
> > statistical translator, with your help I hope.
> 
> Well, the challenge is basically to add the words, and make sure they
> translate and generate. Not a massive challenge ;)
> 
> > > 
> > > Your efforts since last year have mostly made the pair worse not better.
> > > This is probably unintentional, but was my impression last time I looked
> > > at it.
> > > 
> > 
> > True. Most of the problems are due to that I've postponed the tagger
> > training, following your advice. The tagger performed badly from start
> > and hasn't got a chance since I've changed the terminology in the
> > dictionaries to comply with most langugaes, including Norwegian.
> 
> Jonas (one of our GSOC students) has been working on adapting the
> Norwegian Bokmål constraint grammar to Danish. You might try using that.
> 
> > The other problem is that I've introduced quite many synonyms. I hope
> > that implementing your lexical selection module would take care of them.
> 
> Yes!
> 
> > Finally, I have to trim the dictionaries. I might need some help with
> > the script.
> 
> You can check out how it is done in:
> 
> https://svn.code.sf.net/p/apertium/svn/trunk/apertium-kaz-tat/Makefile.am
> 
> Specifically the lines:
> 
> .deps/$(PREFIX1).autobil.prefixes: $(PREFIX1).autobil.bin
> 
> and
> 
> $(PREFIX1).automorf.bin:
> $(BASENAME).$(PREFIX1).LR.att.gz .deps/$(PREFIX1).autobil.prefixes
> 
> F.
> 

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Old fashoned SMT IBM model 1 outperforms Apertium

Reply via email to