Re: [Moses-support] Getting Bad Results with moses

Zeeshan Ahmed Sun, 09 Dec 2012 11:18:34 -0800

Hi,

It seems the problem is lower and upper case issue. You are using lower
case while decoding and upper case during training.


Best Regards,
Zeeshan Ahmed

On 9 December 2012 23:52, swapnil jadhav <[email protected]> wrote:

> Hello,
>
> I am trying to model translation system between graphemes to phonemes.
> I made source file and target file using CMU 7 dictionary.
>
> *corpus.ph ( phonemes as target)*
> ...
> K AA1 N V ER0 S AH0 Z
> K AH0 N V ER1 S IH0 NG
> K AH0 N V ER1 ZH AH0 N
> K AH0 N V ER1 ZH AH0 N Z
> K AH0 N V ER1 ZH AH0 N Z
> ...
> *and*
>
> *corpus.gr ( graphemes as source)*
> ...
> C O N V E R S E S
> C O N V E R S I N G
> C O N V E R S I O N
> C O N V E R S I O N ' S
> C O N V E R S I O N S
> ...
>
> Totally there are 133247 lines.
> I followed the method described at following without any errors.
>
> http://www.statmt.org/moses/?n=Development.GetStarted
> http://www.statmt.org/moses/?n=Moses.Baseline
>
> I used 100% dictionary for training and 10% for tuning.
>
> But I  am not getting correct answers.
>
> For example.
> saj@Jadhavs:~$* echo 'h e l l' | ~/g2p/mosesdecoder-master/bin/moses -f
> ~/g2p/working/binarised-model/moses.ini*
> Defined parameters (per moses.ini or switch):
>     config: /home/saj/g2p/working/binarised-model/moses.ini
>     distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6
> /home/saj/g2p/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz
>
>     distortion-limit: 6
>     input-factors: 0
>     lmodel-file: 8 0 3 /home/saj/g2p/lm/corpus.blm.ph
>     mapping: 0 T 0
>     threads: 2
>     ttable-file: 0 0 0 5 /home/saj/g2p/working/train/model/phrase-table.gz
>     ttable-limit: 20
>     weight-d: 0.0155888 -0.198259 0.0255069 0.0189083 0.00337951
> 0.00813773 0.0176252
>     weight-l: 0.0393243
>     weight-t: 0.0455618 0.0121683 0.453074 -0.033838 0.00337398
>     weight-w: 0.125254
> /home/saj/g2p/mosesdecoder-master/bin
> ScoreProducer: Distortion start: 0 end: 1
> ScoreProducer: WordPenalty start: 1 end: 2
> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
> Loading lexical distortion models...have 1 models
> ScoreProducer: LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3
> end: 9
> Creating lexical reordering...
> weights: -0.198 0.026 0.019 0.003 0.008 0.018
> Loading table into memory...done.
> Start loading LanguageModel /home/saj/g2p/lm/corpus.blm.ph : [12.396]
> seconds
> ScoreProducer: LM start: 9 end: 10
> Finished loading LanguageModels : [12.397] seconds
> Start loading PhraseTable
> /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds
> filePath: /home/saj/g2p/working/train/model/phrase-table.gz
> ScoreProducer: PhraseModel start: 10 end: 15
> Finished loading phrase tables : [12.397] seconds
> Start loading phrase table from
> /home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds
> Reading /home/saj/g2p/working/train/model/phrase-table.gz
>
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>
> ****************************************************************************************************
> Finished loading phrase tables : [21.864] seconds
> IO from STDOUT/STDIN
> Created input-output object : [21.864] seconds
> Translating line 0  in thread id 3066051392
> Translating: h e l l
> Line 0: Collecting options took 0.000 seconds
> Line 0: Search took 0.000 seconds
> h e l l
> *BEST TRANSLATION: h|UNK|UNK|UNK e|UNK|UNK|UNK l|UNK|UNK|UNK
> l|UNK|UNK|UNK [1111] ** 
> [total=-402.355]*core=(0.000,-4.000,-400.000,0.000,0.000,0.000,0.000,0.000,0.000,-47.137,0.000,0.000,0.000,0.000,0.000)
>
> Line 0: Translation took 0.001 seconds total
> user    21.529
> sys    0.316
> VmPeak:   424040 kB
> VmRSS:    400732 kB
> saj@Jadhavs:~$
>
>
> For Each input I am getting UNK.
> I have used full dictionary for training but sizes of parallel corpuses
> are less ( ~2.5 MB).
> The only change I made is I did not use -L option for language and used
> English as default for both the corpuses.
> Is that a factor for wrong answer ?
> If it is then what parallel data and language I should use ?
> I did not skip a step written in pages mentioned above.
> Is grapheme to phoneme conversion possible with moses ?
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Getting Bad Results with moses

Reply via email to