There's no assumption that English is the target language.
In some Moses programs, -f referred to French, -e referred to English.
However, this is historical.
It should be
-f = source
-e = target
but no-one has had the time to change the variable names.
On 22/01/2014 03:15, Arththika Paramanathan wrote:
In moses, it assume English as a target language & other language is
source language (foreign). So that we can translate a foreign language
to English (In my case, Tamil-English). I want to translate
English-Tamil. So, what I want to change,
(in train-model.perl file/ )
On Wed, Jan 22, 2014 at 8:37 AM, Arththika Paramanathan
<[email protected] <mailto:[email protected]>> wrote:
Hi Nicola,
Thank you for your response.
I think in LM with IRSTLM, there are 4 or 5 steps.
In step 1, it will split the corpus as 1-gram with it's frequency
count (there is no sorting here)
In step 2, split this dictionary into 3 dictionaries (balanced
n-gram lists). Here, the threshold is approximately the total
words divided by 3. Is it correct?
In step 3, Collect n-gram for each dictionary. ie) for each words
in each spitted dictionary, it search for 3-gram & put them in a
separate file.
Then I don't understand the next step (ARPA file).
How to calculate this?
-3.72202 <s> -0.598275
-3.17795 illegal -0.60206
-2.42099 folder -0.500602
-2.53169 name -0.723104
Can you please explain me that how to calculate this?
On Tue, Jan 21, 2014 at 10:46 PM, Nicola Bertoldi <[email protected]
<mailto:[email protected]>> wrote:
Hi Arththika,
(1) In language modelling,
how IRSTLM split the dictionary which is extracted from
corpus into 3 dictionaries?
how to calculate n-gram counts?
I would like to answer your first question
as a responsible of the IRSLTM tookit
If not clear, please reply privately to me only.
I suppose you are using the build-lm.sh script from IRSTLM
The script split the dictionary, sorted according the 1-grams
frequency,
in such a way that the global frequency of each part is balanced.
In this way the corresponding partitions of the n-grams are
balanced as well.
the n-gram partition is built by taking into consideration the
first token,
Not sure what do you mean with the second part of the question.
best regards,
Nicola
On Jan 20, 2014, at 7:34 PM, Arththika Paramanathan wrote:
Hi,
(2) And, If English is the foreign language, what I want to
change, (in train-model.perl file)
(3) can anyone tell me that how to use a perl module? I want
to use this module named Locale-Maketext-Lexicon-0.97 to
extract translatable strings from po files.
--
regards,
P.Arththika
_______________________________________________
Moses-support mailing list
[email protected]
<mailto:[email protected]><mailto:[email protected]
<mailto:[email protected]>>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
regards,
P.Arththika
--
regards,
P.Arththika
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support