Re: [Moses-support] Language modelling

Arththika Paramanathan Tue, 21 Jan 2014 19:19:19 -0800

In moses, it assume English as a target language & other language is source
language (foreign). So that we can translate a foreign language to English
(In my case, Tamil-English). I want to translate English-Tamil. So, what I
want to change,
(in train-model.perl file/ )



On Wed, Jan 22, 2014 at 8:37 AM, Arththika Paramanathan <
[email protected]> wrote:

> Hi Nicola,
> Thank you for your response.
>
> I think in LM with IRSTLM, there are 4 or 5 steps.
> In step 1, it will split the corpus as 1-gram with it's frequency count
> (there is no sorting here)
> In step 2, split this dictionary into 3 dictionaries (balanced n-gram
> lists). Here, the threshold is approximately the total words divided by 3.
> Is it correct?
> In step 3, Collect n-gram for each dictionary. ie) for each words in each
> spitted dictionary, it search for 3-gram & put them in a separate file.
> Then I don't understand the next step (ARPA file).
> How to calculate this?
> -3.72202    <s>    -0.598275
> -3.17795    illegal    -0.60206
> -2.42099    folder    -0.500602
> -2.53169    name    -0.723104
>
> Can you please explain me that how to calculate this?
>
>
>
>
>
>
>
> On Tue, Jan 21, 2014 at 10:46 PM, Nicola Bertoldi <[email protected]> wrote:
>
>> Hi Arththika,
>>
>>
>> (1) In language modelling,
>>    how IRSTLM split the dictionary which is extracted from corpus into 3
>> dictionaries?
>>    how to calculate n-gram counts?
>>
>>
>>
>> I would like to answer your first question
>> as a responsible of the IRSLTM tookit
>>
>> If not clear, please reply privately to me only.
>>
>>
>> I suppose you are using the build-lm.sh script from IRSTLM
>>
>> The script split  the dictionary, sorted according the 1-grams frequency,
>> in such a way that the global frequency of each part is  balanced.
>>
>> In this way the corresponding partitions of the n-grams are balanced as
>> well.
>> the n-gram partition is built by taking into consideration the first
>> token,
>>
>> Not sure what do you mean with the second part of the question.
>>
>> best regards,
>> Nicola
>>
>>
>>
>>
>> On Jan 20, 2014, at 7:34 PM, Arththika Paramanathan wrote:
>>
>> Hi,
>>
>> (2) And, If English is the foreign language, what I want to change, (in
>> train-model.perl file)
>>
>> (3) can anyone tell me that how to use a perl module? I want to use this
>> module named Locale-Maketext-Lexicon-0.97 to extract translatable strings
>> from po files.
>>
>>
>>
>> --
>> regards,
>> P.Arththika
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]<mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> regards,
> P.Arththika
>



-- 
regards,
P.Arththika

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Language modelling

Reply via email to