In my opinion, that depends on the differences between the source language and the target language, and also depends on the domain of the test set.
1. if the two languages are quite different, e.g. Chinese-English: the words are totally different, and the grammars are also different, so we need more training data; 2. if the test set contains many different domains of texts, of course the training data also need to contain these domains in order to get good performance. Best wishes! Pidong On 11 May 2012 00:02, tharaka weheragoda <[email protected]> wrote: > Hi All, > If anybody knows about the minimum amount of parallel data required for > SMT to perform well please let me know. > > Thanks in advance! > Tharaka > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Wang Pidong Department of Computer Science School of Computing National University of Singapore
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
