Hi Gebregziabher,

You'll need to create a script of your own to "tokenize" ge'ez languages
since ge'ez punctuation are different from that of English.

for eg.
sed 's/።/ ።/g' < corpus.tg > corpus.tok.tg


On Fri, Nov 25, 2016 at 4:25 PM, G/her G/libanos <[email protected]> wrote:

> hello there
>
> I am doing my research on local Ethiopian language Amharic-Tigrigna
>
> but when I try to tokenize the Tigrigna corpus it doesn't know the
> language Tigrigna
> thnaks for ur help
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Asteway Negash
IVS Technology Consulting
Manager, Lead Technical Consultant​
mobile: +251911369618

skype: asteway_neg
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to