I built a few systems using the kits you specified and my language pairs are quite "unusual". My first system was a Slovenian to English translator based on 30000 parallel sentences.
I produced a few systems based on this language pair using different corpora or selected parts of the main corpus. My current area of interest is translation of similar languages, I produced three new systems using 1984 book from Orwell, language pairs are: Slovenian - Czech Slovenian - Serbian Slovenian - English All systems are based on one multilingual corpus so I can directly compare results. Based on 6000+ aligned sentences. Most of the systems are available online at the address: http://www.pef.upr.si/~jernej/ click on each Menola word to get a desired system interface. In may opinion, your corpus is somewhat too small, but one can always try... On Tue, 30 Mar 2004, paul johnston wrote: > Just wondering how many people have built SMT systems using the CMU-Toolkit, > Giza and the ISI Decoder and what were the sizes of their language and > translation models. > I've put together an Estonian to English system using the BNC as the > language model and so far 1500 pairs of parallel sentences. > I would be especially interested to hear from people using more unusual > language pairs. > Thanks in advance > Paul Johnston (UMIST) > > > _______________________________________________ > MT-List mailing list > [EMAIL PROTECTED] > http://www.computing.dcu.ie/mailman/listinfo/mt-list > _______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
