Hi again, Thank you. I will dig into this. You didn't answer my question about what's wrong with the English version of the Block World Corpus? It might be a good idea to improve the language:
> The English data for the corpus is kind of weird (borderline > ungrammatical) in some places. Feel free to improve the English data. All improvements are welcome! Yours, Per Tunedal On Fri, Aug 30, 2013, at 10:53, Francis Tyers wrote: > El dv 30 de 08 de 2013 a les 10:39 +0200, en/na Per Tunedal va escriure: > > Hi, > > > > On Thu, Aug 29, 2013, at 11:20, Francis Tyers wrote: > > > El dj 29 de 08 de 2013 a les 10:13 +0200, en/na Per Tunedal va escriure: > > > > Hi, > > > > the design of Apertium has some resemblance with the outdated > > > > word-to-word statistical translations models, especially the simplest: > > > > IBM model 1: > > > > 1 The translation is made word by word. > > > > 2. The most probable translation of a word is chosen (developers are > > > > advised to have only one translation in the bidix - the most common). > > > > 3. The translation is supposed to work best for closely related > > > > languages. > > > > > > > > Point 2 makes Apertium quite similar to IBM model 1 without the language > > > > model: then only the most probable word is chosen. Unfortunately, this > > > > often leads to terrible translations. > > > > > > Except: > > > > > > * You can use the lexical selection module, which can give equivalent > > > results to using a target-language model. > > > > Sure. It's on the to do list. > > > > > * In IBM model 1 there is no reordering. > > > > True. But there isn't much need for reordering (if any) when translating > > between Swedish and Danish. That's why I've chosen to challenge Apertium > > by the simple IBM model 1. My task is now to beat that simple > > statistical translator, with your help I hope. > > Well, the challenge is basically to add the words, and make sure they > translate and generate. Not a massive challenge ;) > > > > > > > Your efforts since last year have mostly made the pair worse not better. > > > This is probably unintentional, but was my impression last time I looked > > > at it. > > > > > > > True. Most of the problems are due to that I've postponed the tagger > > training, following your advice. The tagger performed badly from start > > and hasn't got a chance since I've changed the terminology in the > > dictionaries to comply with most langugaes, including Norwegian. > > Jonas (one of our GSOC students) has been working on adapting the > Norwegian Bokmål constraint grammar to Danish. You might try using that. > > > The other problem is that I've introduced quite many synonyms. I hope > > that implementing your lexical selection module would take care of them. > > Yes! > > > Finally, I have to trim the dictionaries. I might need some help with > > the script. > > You can check out how it is done in: > > https://svn.code.sf.net/p/apertium/svn/trunk/apertium-kaz-tat/Makefile.am > > Specifically the lines: > > .deps/$(PREFIX1).autobil.prefixes: $(PREFIX1).autobil.bin > > and > > $(PREFIX1).automorf.bin: > $(BASENAME).$(PREFIX1).LR.att.gz .deps/$(PREFIX1).autobil.prefixes > > F. > ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
