El dl 17 de 09 de 2012 a les 13:10 +0200, en/na Maria Fronczak va escriure: > Hello everyone, > > this email gives some details about version 0.1.0 of the Apertium MT > system for Maltese to Arabic. It has just been released; > apertium-mt-ar itself is in the staging/ directory. The system is > partially based on the Maltese->Hebrew pair; it was developed this > summer as a Google Summer of Code project under mentorship of Kevin > Brubeck Unhammer and Francis Tyers. > > Some statistics: > > - Number of entries in dictionaries: > -- Maltese monolingual: 7154 > -- bilingual: 7685 > -- Arabic monolingual: 6220 > > - Rules: > -- disambiguation: 29 > -- transfer: 163 (chunker) + 7 (interchunk) > > - Coverage (Maltese monolingual): > -- news corpus: 84.54 % (999722 known words, 1182521 tokenised words) > -- wikipedia: 82.47 % (780288 known, 946197 tokenised) > -- Scannell corpus: 84.27 % (8587965 known, 10191487 tokenised) > > Evaluation was done on a regular basis as the project went on (2 texts > of 200 words, 2 texts of 500 words, all taken from the Maltese > wikipedia). Results ranged from 8.70 % to 23.11 % (WER, there were no > unknown words): 8.70% (200 words), 23.11 % (500 words), 17.28 % (200 > words), 21.34 % (500 words). Results of the preliminary evaluation - > of what had been done before Google Summer of Code started - were > better: 3.17 % WER; but here a very simple story of 300 words was > used. The evaluation texts can be found in the dev/story/ > subdirectory. > > We compared the results with Google: the evaluation texts were > translated with Google and postedited. WER figures obtained this way > are higher than the respective apertium-mt-ar results (20.94 % Google > against 8.70 % apertium-mt-ar, 33.53 % against 23.11 %, 35.89 % > against 17.28 %, 39.00 % against 21.34 %, and 47.44 % against 3.17 % > for the simple story). It should be noted though that all the > evaluation texts were actually used in the development of > apertium-mt-ar. > Of course Google deals better with elegant translation of the whole > phrases, but also with issues such as definiteness/indefiniteness in > the output - which is a big problem in apertium-mt-ar translations. > The most striking problems in the Google translations are of > grammatical nature: for example impersonal constructions are often > used when personal forms are expected, incorrect verbal personal forms > are also frequent. This is where apertium-mt-ar performs better. > > > Maltese->Arabic seemed a promising pair, because Maltese is a dialect > of Arabic - although greatly influenced by Italian and English. But > the two languages are not as similar as I first thought: I > underestimated differences between Arabic dialects and Standard > Arabic, especially when it comes to syntax. Much work on transfer > rules is still needed - that is why I asked my mentors to move the > pair to staging/ rather than to trunk. The current release is an early > one. > > Hopefully one day Arabic->Maltese direction will be available as well. > The foundations for this are laid: basic transfer rules are written; > at the moment both Maltese->Arabic and Arabic->Maltese are testvoc > clean. The main issues for Arabic->Maltese are: Arabic disambiguation, > further development of the Arabic analyser (which was written from > scratch) and of the transfer rules. > > Thank you for reading. > > Best regards, > Maria Fronczak
Great work Maria! :) A quick extra note to this, the file can be found here: http://sourceforge.net/projects/apertium/files/apertium-mt-ar/apertium-mt-ar-0.1.0.tar.gz/download VĂctor, would it be possible to include it in the webservice when you get a moment ? Fran ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
