Re: [Moses-support] Language modelling

Hieu Hoang Thu, 23 Jan 2014 14:53:27 -0800

There's no assumption that English is the target language.

In some Moses programs, -f referred to French, -e referred to English.However, this is historical.


It should be
   -f = source
   -e = target
but no-one has had the time to change the variable names.


On 22/01/2014 03:15, Arththika Paramanathan wrote:

In moses, it assume English as a target language & other language issource language (foreign). So that we can translate a foreign languageto English (In my case, Tamil-English). I want to translateEnglish-Tamil. So, what I want to change,

(in train-model.perl file/ )

On Wed, Jan 22, 2014 at 8:37 AM, Arththika Paramanathan<[email protected] <mailto:[email protected]>> wrote:


    Hi Nicola,
    Thank you for your response.

    I think in LM with IRSTLM, there are 4 or 5 steps.
    In step 1, it will split the corpus as 1-gram with it's frequency
    count (there is no sorting here)
    In step 2, split this dictionary into 3 dictionaries (balanced
    n-gram lists). Here, the threshold is approximately the total
    words divided by 3. Is it correct?
    In step 3, Collect n-gram for each dictionary. ie) for each words
    in each spitted dictionary, it search for 3-gram & put them in a
    separate file.
    Then I don't understand the next step (ARPA file).
    How to calculate this?
    -3.72202    <s>    -0.598275
    -3.17795    illegal    -0.60206
    -2.42099    folder    -0.500602
    -2.53169    name    -0.723104

    Can you please explain me that how to calculate this?







    On Tue, Jan 21, 2014 at 10:46 PM, Nicola Bertoldi <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Arththika,


        (1) In language modelling,
           how IRSTLM split the dictionary which is extracted from
        corpus into 3 dictionaries?
           how to calculate n-gram counts?



        I would like to answer your first question
        as a responsible of the IRSLTM tookit

        If not clear, please reply privately to me only.


        I suppose you are using the build-lm.sh script from IRSTLM

        The script split  the dictionary, sorted according the 1-grams
        frequency,
        in such a way that the global frequency of each part is  balanced.

        In this way the corresponding partitions of the n-grams are
        balanced as well.
        the n-gram partition is built by taking into consideration the
        first token,

        Not sure what do you mean with the second part of the question.

        best regards,
        Nicola




        On Jan 20, 2014, at 7:34 PM, Arththika Paramanathan wrote:

        Hi,

        (2) And, If English is the foreign language, what I want to
        change, (in train-model.perl file)

        (3) can anyone tell me that how to use a perl module? I want
        to use this module named Locale-Maketext-Lexicon-0.97 to
        extract translatable strings from po files.



        --
        regards,
        P.Arththika
        _______________________________________________
        Moses-support mailing list
        [email protected]
        <mailto:[email protected]><mailto:[email protected]
        <mailto:[email protected]>>
        http://mailman.mit.edu/mailman/listinfo/moses-support

--regards,

    P.Arththika




--
regards,
P.Arththika


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Language modelling

Reply via email to