you can skip tokenization, but the quality is likely to be bad. If this is the 1st time you are working with Moses, please follow the tutorial http://www.statmt.org/moses/?n=Moses.Tutorial
Once you understand a little bit more about Moses, you can apply it to your own data and language pair On 18 October 2014 14:51, Arezoo Arjomand <[email protected]> wrote: > Hello > Thank you for answer. Is it possible for Moses to process without any > data preparation? On the other hand, Is it possible to skip from > tokenization step? > > > On Saturday, October 18, 2014 3:55 PM, Hieu Hoang <[email protected]> > wrote: > > > Moses has no problems dealing with Persian language or any otyher language > with non-Latin scripts. > > Your data should be encoded as UTF8, which allow any language script. > Left-to-right is just a display issue, not a data issue. > > However, a problem you may have with Persian is that Moses doesn't know > how to tokenize it, and it will use the English tokenizer instead to > tokenize Persian. You should consider searching for a good tokenizer for > Persian, or write 1 yourself. You can do it in Moses, please look in the > subdirectory > scripts/share/nonbreaking_prefixes > > Another problem is you need parallel data for your language pair. > > On 18 October 2014 09:41, Arezoo Arjomand <[email protected]> > wrote: > > Hi > I want to use Moses to translate languages that each side of languages are > not English with different script format (like Persian scripts which is > right to left script) Is it appropriate to use Moses to translate the > such languages? > Thank You > > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > > > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
