Hi, I do not fully understand your tokenization issues, but you should look into writing a tokenizer that suits your needs. The Moses decoder is agnotistic about tokenization.
Regarding the demo: Look at the documentation on the Moses web site on the Moses server implementation. You can also speed up decoding by reducing the beam sizes (stack size, cube pruning pop limit, ...). -phi On Tue, Dec 1, 2009 at 12:57 PM, <[email protected]> wrote: > Hello All, > > I am preparing the phrase table for french-english conversion. When > inputing the french corpus, I have the following queries: > 1. How can we make the training system to recognize the strokes (like é) in > french words? > 2. And also special characters? For example when I try to build the phrase > table inputing the corpus that includes '(aphostrophe), -(hiphen), > .(dot) the tokenizer fails. But if I give space before and after these spl. > chars the tokenizer works and the phrase table is properly built. But we > should also include the spaces when we try to translate any input french > word that contains these spl. chars to the decoder. > 3. We would like to develop a web translator tool that can translate any > french files to english using moses internally. For this, we initially tried > to build a EXE that takes french text files (not web pages) and converts > into english and we see that it happens in 2 mins for 6 KB file. when we > eventually go for web, what is the best approach to make the translation > tool more faster like your Moses Online Demo? > > Thanks & Regards, > Abhinandan > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
