Dear all,

As in the title I'd like to ask about the different parameters related to
phrase table in Moses, for example "my $MAX_LENGTH = 10;" in
"filter-model-given-input.pl", or [ttable-limit] 20 in "moses.ini". This
comes to my concern when I consider the following tasks:
* Initially, I have two phrase tables: one train at *word level*, and the
other at *morpheme-level* (training data is the same, just that at morpheme
level, each word is segmented into morphemes). For example:
** word phrase table: A1A2A3   B1B2B3 ||| C1C2C3   D1D2D3
** morph phrase table: A1 A2 A3 B1 B2 B3 ||| C1 C2 C3 D1 D2 D3
(each Ai, Bi, Ci, Di is morpheme)

* After that, I want to concatenate these two phrase tables into 1 with the
entries for example:
A1 A2 A3 B1 B2 B3 ||| C1C2C3   D1D2D3
A1 A2 A3 B1 B2 B3 ||| C1 C2 C3 D1 D2 D3

Notice that: all the *word-source phrases* are now at morpheme level. The
purpose of doing this is to add the options of translating from morpheme
sequences to words when translating at morpheme level. I have the following
questions:
* Since source phrases previously at word level are now tokenized into long
sequence of morphemes, I was wondering if it will penalize the translation
qualify, and what parameters are related to the length or the number of
entries taken during decoding. So that I could watch out for them!

* The second question is about the scores. I just simply concatenating the
scores from two table together without adjusting them (for eg. translation
scores should add up to 1 with the same source phrase). Do you think that
I'd affect the translation quality significantly?

All answers and comments are very much appreciated! Tks!

Regards,
Thang

-- 
Luong Minh Thang
WING group, School of Computing, National University of Singapore
http://wing.comp.nus.edu.sg/~lmthang
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to