Dear all, As in the title I'd like to ask about the different parameters related to phrase table in Moses, for example "my $MAX_LENGTH = 10;" in "filter-model-given-input.pl", or [ttable-limit] 20 in "moses.ini". This comes to my concern when I consider the following tasks: * Initially, I have two phrase tables: one train at *word level*, and the other at *morpheme-level* (training data is the same, just that at morpheme level, each word is segmented into morphemes). For example: ** word phrase table: A1A2A3 B1B2B3 ||| C1C2C3 D1D2D3 ** morph phrase table: A1 A2 A3 B1 B2 B3 ||| C1 C2 C3 D1 D2 D3 (each Ai, Bi, Ci, Di is morpheme)
* After that, I want to concatenate these two phrase tables into 1 with the entries for example: A1 A2 A3 B1 B2 B3 ||| C1C2C3 D1D2D3 A1 A2 A3 B1 B2 B3 ||| C1 C2 C3 D1 D2 D3 Notice that: all the *word-source phrases* are now at morpheme level. The purpose of doing this is to add the options of translating from morpheme sequences to words when translating at morpheme level. I have the following questions: * Since source phrases previously at word level are now tokenized into long sequence of morphemes, I was wondering if it will penalize the translation qualify, and what parameters are related to the length or the number of entries taken during decoding. So that I could watch out for them! * The second question is about the scores. I just simply concatenating the scores from two table together without adjusting them (for eg. translation scores should add up to 1 with the same source phrase). Do you think that I'd affect the translation quality significantly? All answers and comments are very much appreciated! Tks! Regards, Thang -- Luong Minh Thang WING group, School of Computing, National University of Singapore http://wing.comp.nus.edu.sg/~lmthang
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
