Hi Gebregziabher, You'll need to create a script of your own to "tokenize" ge'ez languages since ge'ez punctuation are different from that of English.
for eg. sed 's/።/ ።/g' < corpus.tg > corpus.tok.tg On Fri, Nov 25, 2016 at 4:25 PM, G/her G/libanos <[email protected]> wrote: > hello there > > I am doing my research on local Ethiopian language Amharic-Tigrigna > > but when I try to tokenize the Tigrigna corpus it doesn't know the > language Tigrigna > thnaks for ur help > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Asteway Negash IVS Technology Consulting Manager, Lead Technical Consultant mobile: +251911369618 skype: asteway_neg
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
