hi ihab

at it's most basic, tokenization separates punctuations from words.
However, it can also be used to separate a word into it's morphemes to make
it easier to process.

Moses doesn't include a very good Arabic tokeniser. Each language needs a
nonbreaking_prefix file, located in
   scripts/share/nonbreaking_prefixes
This doesn't exist for arabic, so the tokenizer uses the English file
instead.

If you create a nonbreaking_prefixes for arabic, please share it with us.
Or use a tool like MADA to tokenizer your arabic data

On 28 October 2014 14:40, Ihab Ramadan <[email protected]> wrote:

> Dears,
>
> I have misunderstanding on what tokenization really do
>
> What I think that It makes the translation of  text like translated text gives
> the same output as “translated” text or translated.text or translated
> text . which ignores any punctuations in the translated text
>
> Am I right ?
>
> I did the tokenization on my data but this is not happening
>
> Note : in the tokenizer script I should feed it with the language and it
> could not recognize the arabic language (ar) which is my target language
>
>
>
> Best Regards
>
> *Ihab Ramadan*| Senior Developer| Saudisoft <http://www.saudisoft.com/> -
> Egypt | *Tel * +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax
> +20233032036 | *Follow us on *[image: linked]
> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>*
>  |
> **[image: ZA102637861]*
> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>*
>  |
> **[image: ZA102637858]* <https://twitter.com/Saudisoft>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to