Good catch Zeeshan. 

We use Moses for a good variety of functions
like this, including letter-to-sounds as you're doing. If casing is not
the problem, double-check that you didn't accidentally train src and tgt
in reverse. 

Tom 

On 2012-12-10 02:16, Zeeshan Ahmed wrote: 

> Hi,
>

> It seems the problem is lower and upper case issue. You are using
lower case while decoding and upper case during training.
> 
> Best
Regards,
> Zeeshan Ahmed
> 
> On 9 December 2012 23:52, swapnil jadhav
<[email protected]> wrote:
> 
>> Hello,
>> 
>> I am trying to model
translation system between graphemes to phonemes.
>> I made source file
and target file using CMU 7 dictionary.
>> 
>> CORPUS.PH [1] ( PHONEMES
AS TARGET)
>> ...
>> K AA1 N V ER0 S AH0 Z 
>> K AH0 N V ER1 S IH0 NG

>> K AH0 N V ER1 ZH AH0 N 
>> K AH0 N V ER1 ZH AH0 N Z 
>> K AH0 N V
ER1 ZH AH0 N Z 
>> ...
>> _AND_
>> 
>> CORPUS.GR [2] ( GRAPHEMES AS
SOURCE)
>> ...
>> C O N V E R S E S 
>> C O N V E R S I N G 
>> C O N V
E R S I O N 
>> C O N V E R S I O N ' S 
>> C O N V E R S I O N S 
>>
...
>> 
>> Totally there are 133247 lines.
>> I followed the method
described at following without any errors.
>> 
>>
http://www.statmt.org/moses/?n=Development.GetStarted [3]
>>
http://www.statmt.org/moses/?n=Moses.Baseline [4]
>> 
>> I used 100%
dictionary for training and 10% for tuning.
>> 
>> But I am not getting
correct answers.
>> 
>> For example.
>> saj@Jadhavs:~$ ECHO 'H E L L' |
~/G2P/MOSESDECODER-MASTER/BIN/MOSES -F
~/G2P/WORKING/BINARISED-MODEL/MOSES.INI 
>> Defined parameters (per
moses.ini or switch):
>> config:
/home/saj/g2p/working/binarised-model/moses.ini 
>> distortion-file: 0-0
wbe-msd-bidirectional-fe-allff 6
/home/saj/g2p/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz

>> distortion-limit: 6 
>> input-factors: 0 
>> lmodel-file: 8 0 3
/home/saj/g2p/lm/corpus.blm.ph [5] 
>> mapping: 0 T 0 
>> threads: 2 
>>
ttable-file: 0 0 0 5 /home/saj/g2p/working/train/model/phrase-table.gz

>> ttable-limit: 20 
>> weight-d: 0.0155888 -0.198259 0.0255069
0.0189083 0.00337951 0.00813773 0.0176252 
>> weight-l: 0.0393243 
>>
weight-t: 0.0455618 0.0121683 0.453074 -0.033838 0.00337398 
>>
weight-w: 0.125254 
>> /home/saj/g2p/mosesdecoder-master/bin
>>
ScoreProducer: Distortion start: 0 end: 1
>> ScoreProducer: WordPenalty
start: 1 end: 2
>> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>>
Loading lexical distortion models...have 1 models
>> ScoreProducer:
LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3 end: 9
>>
Creating lexical reordering...
>> weights: -0.198 0.026 0.019 0.003
0.008 0.018 
>> Loading table into memory...done.
>> Start loading
LanguageModel /home/saj/g2p/lm/corpus.blm.ph [5] : [12.396] seconds
>>
ScoreProducer: LM start: 9 end: 10
>> Finished loading LanguageModels :
[12.397] seconds
>> Start loading PhraseTable
/home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds
>>
filePath: /home/saj/g2p/working/train/model/phrase-table.gz
>>
ScoreProducer: PhraseModel start: 10 end: 15
>> Finished loading phrase
tables : [12.397] seconds
>> Start loading phrase table from
/home/saj/g2p/working/train/model/phrase-table.gz : [12.397] seconds
>>
Reading /home/saj/g2p/working/train/model/phrase-table.gz
>>
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>
****************************************************************************************************
>>
Finished loading phrase tables : [21.864] seconds
>> IO from
STDOUT/STDIN
>> Created input-output object : [21.864] seconds
>>
Translating line 0 in thread id 3066051392 [6]
>> Translating: h e l l

>> Line 0: Collecting options took 0.000 seconds
>> Line 0: Search took
0.000 seconds
>> h e l l 
>> BEST TRANSLATION: H|UNK|UNK|UNK
E|UNK|UNK|UNK L|UNK|UNK|UNK L|UNK|UNK|UNK [1111]  [TOTAL=-402.355]
core=(0.000,-4.000,-400.000,0.000,0.000,0.000,0.000,0.000,0.000,-47.137,0.000,0.000,0.000,0.000,0.000)

>> Line 0: Translation took 0.001 seconds total
>> user 21.529
>> sys
0.316
>> VmPeak: 424040 kB
>> VmRSS: 400732 kB
>> saj@Jadhavs:~$ 
>> 
>>
For Each input I am getting UNK.
>> I have used full dictionary for
training but sizes of parallel corpuses are less ( ~2.5 MB).
>> The only
change I made is I did not use -L option for language and used English
as default for both the corpuses. 
>> Is that a factor for wrong answer
?
>> If it is then what parallel data and language I should use ?
>> I
did not skip a step written in pages mentioned above.
>> Is grapheme to
phoneme conversion possible with moses ? 
>>
_______________________________________________
>> Moses-support mailing
list
>> [email protected]
>>
http://mailman.mit.edu/mailman/listinfo/moses-support [7]
> 
>
_______________________________________________
> Moses-support mailing
list
> [email protected]
>
http://mailman.mit.edu/mailman/listinfo/moses-support [7]



Links:
------
[1] http://corpus.ph
[2] http://corpus.gr
[3]
http://www.statmt.org/moses/?n=Development.GetStarted
[4]
http://www.statmt.org/moses/?n=Moses.Baseline
[5]
http://corpus.blm.ph
[6] tel:3066051392
[7]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to