Hi All,

I am trying to develop Urdu SMT using MOSES. I have Urdu parallel corpus and 
the 1st step in manual is to tokenize the corpus, but when I enter following 
command:

~/SMT/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ur < 
~/SMT/corpus/training/mycorpus.ur-en.ur > ~/SMT/corpus/mycorpus.ur-en.tok.urĀ  


it gives me warning:

WARNING: No known abbreviations for language 'ur', attempting fall-back to 
English version...

It also generates the output file but I don't know that this output is 
tokenized or not


Regards


Asad A.Malik
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to