Hello,

I am trying to model translation system between graphemes to phonemes.
I made source file and target file using CMU 7 dictionary.

corpus.ph ( phonemes as target)
...
K AA1 N V ER0 S AH0 Z 
K AH0 N V ER1 S IH0 NG 
K AH0 N V ER1 ZH AH0 N 
K AH0 N V ER1 ZH AH0 N Z 
K AH0 N V ER1 ZH AH0 N Z 
...
and

corpus.gr ( graphemes as source)
...
C O N V E R S E S 
C O N V E R S I N G 
C O N V E R S I O N 
C O N V E R S I O N ' S 
C O N V E R S I O N S 
...

Totally there are 133247 lines.
I followed the method described at following without any errors.

http://www.statmt.org/moses/?n=Development.GetStarted
http://www.statmt.org/moses/?n=Moses.Baseline

I used 100% dictionary for training and 10% for tuning.

But I  am not getting correct answers.

For example.
saj@Jadhavs:~$ echo 'h e l l' | ~/g2p/mosesdecoder-master/bin/moses -f 
~/g2p/working/binarised-model/moses.ini 
Defined parameters (per moses.ini or switch):
    config: /home/saj/g2p/working/binarised-model/moses.ini 
    distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 
/home/saj/g2p/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz 
    distortion-limit: 6 
    input-factors: 0 
    lmodel-file: 8 0 3 /home/saj/g2p/lm/corpus.blm.ph 
    mapping: 0 T 0 
    threads: 2 
    ttable-file: 0 0 0 5 /home/saj/g2p/working/train/model/phrase-table.gz 
    ttable-limit: 20 
    weight-d: 0.0155888 -0.198259 0.0255069 0.0189083 0.00337951 0.00813773 
0.0176252 
    weight-l: 0.0393243 
    weight-t: 0.0455618 0.0121683 0.453074 -0.033838 0.00337398 
    weight-w: 0.125254 
/home/saj/g2p/mosesdecoder-master/bin
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 1 models
ScoreProducer: LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3 end: 9
Creating lexical reordering...
weights: -0.198 0.026 0.019 0.003 0.008 0.018 
Loading table into memory...done.
Start loading LanguageModel /home/saj/g2p/lm/corpus.blm.ph : [12.396] seconds
ScoreProducer: LM start: 9 end: 10
Finished loading LanguageModels : [12.397] seconds
Start loading PhraseTable /home/saj/g2p/working/train/model/phrase-table.gz : 
[12.397] seconds
filePath: /home/saj/g2p/working/train/model/phrase-table.gz
ScoreProducer: PhraseModel start: 10 end: 15
Finished loading phrase tables : [12.397] seconds
Start loading phrase table from 
/home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds
Reading /home/saj/g2p/working/train/model/phrase-table.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Finished loading phrase tables : [21.864] seconds
IO from STDOUT/STDIN
Created input-output object : [21.864] seconds
Translating line 0  in thread id 3066051392
Translating: h e l l 
Line 0: Collecting options took 0.000 seconds
Line 0: Search took 0.000 seconds
h e l l 
BEST TRANSLATION: h|UNK|UNK|UNK e|UNK|UNK|UNK l|UNK|UNK|UNK l|UNK|UNK|UNK 
[1111]  [total=-402.355] 
core=(0.000,-4.000,-400.000,0.000,0.000,0.000,0.000,0.000,0.000,-47.137,0.000,0.000,0.000,0.000,0.000)
  
Line 0: Translation took 0.001 seconds total
user    21.529
sys    0.316
VmPeak:   424040 kB
VmRSS:    400732 kB
saj@Jadhavs:~$ 


For Each input I am getting UNK.
I have used full dictionary for training but sizes of parallel corpuses are 
less ( ~2.5 MB).
The only change I made is I did not use -L option for language and used English 
as default for both the corpuses. 
Is that a factor for wrong answer ?
If it is then what parallel data and language I should use ?
I did not skip a step written in pages mentioned above.
Is grapheme to phoneme conversion possible with moses ?
                                          
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to