Hi All,
I am trying to develop Urdu SMT using MOSES. I have Urdu parallel corpus and
the 1st step in manual is to tokenize the corpus, but when I enter following
command:
~/SMT/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ur <
~/SMT/corpus/training/mycorpus.ur-en.ur > ~/SMT/corpus/mycorpus.ur-en.tok.urĀ
it gives me warning:
WARNING: No known abbreviations for language 'ur', attempting fall-back to
English version...
It also generates the output file but I don't know that this output is
tokenized or not
Regards
Asad A.Malik
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support