Hello, I am trying to model translation system between graphemes to phonemes. I made source file and target file using CMU 7 dictionary.
corpus.ph ( phonemes as target) ... K AA1 N V ER0 S AH0 Z K AH0 N V ER1 S IH0 NG K AH0 N V ER1 ZH AH0 N K AH0 N V ER1 ZH AH0 N Z K AH0 N V ER1 ZH AH0 N Z ... and corpus.gr ( graphemes as source) ... C O N V E R S E S C O N V E R S I N G C O N V E R S I O N C O N V E R S I O N ' S C O N V E R S I O N S ... Totally there are 133247 lines. I followed the method described at following without any errors. http://www.statmt.org/moses/?n=Development.GetStarted http://www.statmt.org/moses/?n=Moses.Baseline I used 100% dictionary for training and 10% for tuning. But I am not getting correct answers. For example. saj@Jadhavs:~$ echo 'h e l l' | ~/g2p/mosesdecoder-master/bin/moses -f ~/g2p/working/binarised-model/moses.ini Defined parameters (per moses.ini or switch): config: /home/saj/g2p/working/binarised-model/moses.ini distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 /home/saj/g2p/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz distortion-limit: 6 input-factors: 0 lmodel-file: 8 0 3 /home/saj/g2p/lm/corpus.blm.ph mapping: 0 T 0 threads: 2 ttable-file: 0 0 0 5 /home/saj/g2p/working/train/model/phrase-table.gz ttable-limit: 20 weight-d: 0.0155888 -0.198259 0.0255069 0.0189083 0.00337951 0.00813773 0.0176252 weight-l: 0.0393243 weight-t: 0.0455618 0.0121683 0.453074 -0.033838 0.00337398 weight-w: 0.125254 /home/saj/g2p/mosesdecoder-master/bin ScoreProducer: Distortion start: 0 end: 1 ScoreProducer: WordPenalty start: 1 end: 2 ScoreProducer: !UnknownWordPenalty start: 2 end: 3 Loading lexical distortion models...have 1 models ScoreProducer: LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3 end: 9 Creating lexical reordering... weights: -0.198 0.026 0.019 0.003 0.008 0.018 Loading table into memory...done. Start loading LanguageModel /home/saj/g2p/lm/corpus.blm.ph : [12.396] seconds ScoreProducer: LM start: 9 end: 10 Finished loading LanguageModels : [12.397] seconds Start loading PhraseTable /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds filePath: /home/saj/g2p/working/train/model/phrase-table.gz ScoreProducer: PhraseModel start: 10 end: 15 Finished loading phrase tables : [12.397] seconds Start loading phrase table from /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds Reading /home/saj/g2p/working/train/model/phrase-table.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Finished loading phrase tables : [21.864] seconds IO from STDOUT/STDIN Created input-output object : [21.864] seconds Translating line 0 in thread id 3066051392 Translating: h e l l Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.000 seconds h e l l BEST TRANSLATION: h|UNK|UNK|UNK e|UNK|UNK|UNK l|UNK|UNK|UNK l|UNK|UNK|UNK [1111] [total=-402.355] core=(0.000,-4.000,-400.000,0.000,0.000,0.000,0.000,0.000,0.000,-47.137,0.000,0.000,0.000,0.000,0.000) Line 0: Translation took 0.001 seconds total user 21.529 sys 0.316 VmPeak: 424040 kB VmRSS: 400732 kB saj@Jadhavs:~$ For Each input I am getting UNK. I have used full dictionary for training but sizes of parallel corpuses are less ( ~2.5 MB). The only change I made is I did not use -L option for language and used English as default for both the corpuses. Is that a factor for wrong answer ? If it is then what parallel data and language I should use ? I did not skip a step written in pages mentioned above. Is grapheme to phoneme conversion possible with moses ?
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
