Hi,

Quoting Philipp Koehn <[email protected]>:

> Hi,
>
> yes, it is correct that step 1 is doing just the data preparation for GIZA++.
> The most time-consuming step is running mkcls to creake the classes
> for the relative distortion models.
>

Do you mean the *.vcb files that are created in Step 1? These just  
look like dictionary files with 3 fields a) a numeric ID, b) the word  
entry, c) the frequency of the string. My make_dictionary function  
does this in about 20 seconds. Why is mkcls taking so long? Is it  
doing something complicated that I have missed here?

James

> -phi
>
> On Mon, Aug 31, 2009 at 4:39 PM, James Read<[email protected]> wrote:
>> Hi,
>>
>> does anyone know what step 1 of the moses training script does other
>> than produce the dictionaries and the numerical sentences that enable
>> GIZA++ to do its job. The reason I ask is that on my machine step 1
>> takes just over 70 mins for en-fr Europarl corpus.
>>
>> My optimised version of data preparation and EM IBM Model 1 completes
>> is 121 seconds for five iterations of EM, that's just over 2 minutes.
>> Before publishing these results I just wanted to make sure there's
>> nothing I've missed about step 1 of the training process. Does it do
>> anything at all that influences GIZA++ other than preparing the
>> digital sentences?
>>
>> James
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to