you can skip tokenization, but the quality is likely to be bad.

If this is the 1st time you are working with Moses, please follow the
tutorial
   http://www.statmt.org/moses/?n=Moses.Tutorial

Once you understand a little bit more about Moses, you can apply it to your
own data and language pair


On 18 October 2014 14:51, Arezoo Arjomand <[email protected]> wrote:

> Hello
> Thank you for answer. Is it possible for Moses to  process without any
> data preparation? On the other hand, Is it possible to skip from
> tokenization step?
>
>
>   On Saturday, October 18, 2014 3:55 PM, Hieu Hoang <[email protected]>
> wrote:
>
>
> Moses has no problems dealing with Persian language or any otyher language
> with non-Latin scripts.
>
> Your data should be encoded as UTF8, which allow any language script.
> Left-to-right is just a display issue, not a data issue.
>
> However, a problem you may have with Persian is that Moses doesn't know
> how to tokenize it, and it will use the English tokenizer instead to
> tokenize Persian. You should consider searching for a good tokenizer for
> Persian, or write 1 yourself. You can do it in Moses, please look in the
> subdirectory
>    scripts/share/nonbreaking_prefixes
>
> Another problem is you need parallel data for your language pair.
>
> On 18 October 2014 09:41, Arezoo Arjomand <[email protected]>
> wrote:
>
> Hi
> I want to use Moses to translate languages that each side of languages are
> not English with different script format (like Persian scripts which is
> right to left script) Is it appropriate to use Moses to translate the
> such languages?
> Thank You
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to