Are you familiar with the KhmerOS project on Sourceforge.net?
http://sourceforge.net/projects/khmer/?source=directory
At one time, it included an implementation of Moses through our DoMY
distribution. There was parallel corpus and -- if I'm not mistaken --
there was a tokenizer. A quick look at the project shows it's changed.
So, you might have to dig deeper. Let me know if you can't find
anything, and I'll try again.
Tom
On 11/09/2014 08:08 PM, Hieu Hoang wrote:
There is no specific Khmer tokenizer in Moses so the tokenizer uses
the english scheme.
Each language tokenizer needs a file in
scripts/share/nonbreaking_prefixes
You should create your own for Khmer. If you do, please share it with us.
If this is still not good enough, you should write your own program to
tokenize Khmer.
On 9 November 2014 02:29, Sovath-MITE-319 <[email protected]
<mailto:[email protected]>> wrote:
Dear Mr. Hieu Hoang,
Thank you very much for you quick reply. I can get it works with
your tips.
However, i have been working with Khmer Unicode (utf8), i seem to have
problem with tokenizers which unable me to render not properly.
Do you have any tips of how to get moses work with unicode (utf8, i
means Khmer Unicode).
My Best Regards,
Sovath Chhinh
On Tue, Nov 4, 2014 at 1:10 AM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
> I think there's differences in different versions of irstlm.
Maybe try
> --text yes
> --text
> -text yes
> -text
> Also, Moses comes with the script
> scripts/generic/trainlm-irst2.perl
> which runs IRSTLM for you. You just need to give it the text file.
>
> Also, you might want to look at KenLM's lmplz command, which
also creates a
> LM
>
> On 30 October 2014 15:19, Sovath-MITE-319
<[email protected] <mailto:[email protected]>>
> wrote:
>>
>> Dear Sir,
>>
>> I am a student from Royal University of Phnom Penh, Cambodia.
>>
>> I am under taking Master Degree of Computer Science and my
thesis is
>> working on Paralell Corpus from Khmer to English.
>>
>> However, I have no problem with moses installation as well as
the other
>> tools.
>>
>> Come to step number 5, i seem to get stuck and can't find any
resource
>> to fix this problem.
>> I have found one article that has the same problem too,
>> (http://comments.gmane.org/gmane.comp.nlp.moses.user/9924).
>> But there seems to have no solution. I am not sure if there is
>> something that require to configure before processing step
number 5.
>>
>> PS: Step that i have issue
>>
>> mkdir ~/lm
>> cd ~/lm
>> ~/irstlm/bin/add-start-end.sh \
>> < ~/corpus/news-commentary-v8.fr-en.true.en \
>> > news-commentary-v8.fr-en.sb.en
>> export IRSTLM=$HOME/irstlm; ~/irstlm/bin/build-lm.sh \
>> -i news-commentary-v8.fr-en.sb.en \
>> -t ./tmp -p -s improved-kneser-ney -o
news-commentary-v8.fr-en.lm.en
>> ~/irstlm/bin/compile-lm \
>> --text yes \
>> news-commentary-v8.fr-en.lm.en.gz \
>> news-commentary-v8.fr-en.arpa.en
>>
>> Looking forward to hearing from your support.
>>
>> Best Regards,
>> Sovath
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support