Good catch Zeeshan.
We use Moses for a good variety of functions like this, including letter-to-sounds as you're doing. If casing is not the problem, double-check that you didn't accidentally train src and tgt in reverse. Tom On 2012-12-10 02:16, Zeeshan Ahmed wrote: > Hi, > > It seems the problem is lower and upper case issue. You are using lower case while decoding and upper case during training. > > Best Regards, > Zeeshan Ahmed > > On 9 December 2012 23:52, swapnil jadhav <[email protected]> wrote: > >> Hello, >> >> I am trying to model translation system between graphemes to phonemes. >> I made source file and target file using CMU 7 dictionary. >> >> CORPUS.PH [1] ( PHONEMES AS TARGET) >> ... >> K AA1 N V ER0 S AH0 Z >> K AH0 N V ER1 S IH0 NG >> K AH0 N V ER1 ZH AH0 N >> K AH0 N V ER1 ZH AH0 N Z >> K AH0 N V ER1 ZH AH0 N Z >> ... >> _AND_ >> >> CORPUS.GR [2] ( GRAPHEMES AS SOURCE) >> ... >> C O N V E R S E S >> C O N V E R S I N G >> C O N V E R S I O N >> C O N V E R S I O N ' S >> C O N V E R S I O N S >> ... >> >> Totally there are 133247 lines. >> I followed the method described at following without any errors. >> >> http://www.statmt.org/moses/?n=Development.GetStarted [3] >> http://www.statmt.org/moses/?n=Moses.Baseline [4] >> >> I used 100% dictionary for training and 10% for tuning. >> >> But I am not getting correct answers. >> >> For example. >> saj@Jadhavs:~$ ECHO 'H E L L' | ~/G2P/MOSESDECODER-MASTER/BIN/MOSES -F ~/G2P/WORKING/BINARISED-MODEL/MOSES.INI >> Defined parameters (per moses.ini or switch): >> config: /home/saj/g2p/working/binarised-model/moses.ini >> distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 /home/saj/g2p/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz >> distortion-limit: 6 >> input-factors: 0 >> lmodel-file: 8 0 3 /home/saj/g2p/lm/corpus.blm.ph [5] >> mapping: 0 T 0 >> threads: 2 >> ttable-file: 0 0 0 5 /home/saj/g2p/working/train/model/phrase-table.gz >> ttable-limit: 20 >> weight-d: 0.0155888 -0.198259 0.0255069 0.0189083 0.00337951 0.00813773 0.0176252 >> weight-l: 0.0393243 >> weight-t: 0.0455618 0.0121683 0.453074 -0.033838 0.00337398 >> weight-w: 0.125254 >> /home/saj/g2p/mosesdecoder-master/bin >> ScoreProducer: Distortion start: 0 end: 1 >> ScoreProducer: WordPenalty start: 1 end: 2 >> ScoreProducer: !UnknownWordPenalty start: 2 end: 3 >> Loading lexical distortion models...have 1 models >> ScoreProducer: LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3 end: 9 >> Creating lexical reordering... >> weights: -0.198 0.026 0.019 0.003 0.008 0.018 >> Loading table into memory...done. >> Start loading LanguageModel /home/saj/g2p/lm/corpus.blm.ph [5] : [12.396] seconds >> ScoreProducer: LM start: 9 end: 10 >> Finished loading LanguageModels : [12.397] seconds >> Start loading PhraseTable /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds >> filePath: /home/saj/g2p/working/train/model/phrase-table.gz >> ScoreProducer: PhraseModel start: 10 end: 15 >> Finished loading phrase tables : [12.397] seconds >> Start loading phrase table from /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds >> Reading /home/saj/g2p/working/train/model/phrase-table.gz >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> **************************************************************************************************** >> Finished loading phrase tables : [21.864] seconds >> IO from STDOUT/STDIN >> Created input-output object : [21.864] seconds >> Translating line 0 in thread id 3066051392 [6] >> Translating: h e l l >> Line 0: Collecting options took 0.000 seconds >> Line 0: Search took 0.000 seconds >> h e l l >> BEST TRANSLATION: H|UNK|UNK|UNK E|UNK|UNK|UNK L|UNK|UNK|UNK L|UNK|UNK|UNK [1111] [TOTAL=-402.355] core=(0.000,-4.000,-400.000,0.000,0.000,0.000,0.000,0.000,0.000,-47.137,0.000,0.000,0.000,0.000,0.000) >> Line 0: Translation took 0.001 seconds total >> user 21.529 >> sys 0.316 >> VmPeak: 424040 kB >> VmRSS: 400732 kB >> saj@Jadhavs:~$ >> >> For Each input I am getting UNK. >> I have used full dictionary for training but sizes of parallel corpuses are less ( ~2.5 MB). >> The only change I made is I did not use -L option for language and used English as default for both the corpuses. >> Is that a factor for wrong answer ? >> If it is then what parallel data and language I should use ? >> I did not skip a step written in pages mentioned above. >> Is grapheme to phoneme conversion possible with moses ? >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support [7] > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support [7] Links: ------ [1] http://corpus.ph [2] http://corpus.gr [3] http://www.statmt.org/moses/?n=Development.GetStarted [4] http://www.statmt.org/moses/?n=Moses.Baseline [5] http://corpus.blm.ph [6] tel:3066051392 [7] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
