Hi all, I'm running experiments with Moses and different limits on the size of the phrases extracted (parameter 'max-phrase-length' during training). As expected, I get larger phrase tables as I increase the maximum size allowed for the phrases.
By filtering these phrase tables for the decoding a certain test set (using the script 'filter-model-given-input'), I would expect the filtered phrase tables to have a larger number of entries for larger maximum sizes of phrases, but this is not what is happening. For example, given three phrase tables were the limits on phrase sizes are 2, 3, and 4, I get the following numbers of entries in the filtered versions (all with the same training, dev, test set): --max-phrase-length = 2 --> 3,348,416 entries --max-phrase-length = 3 --> 2,549,971 entries --max-phrase-length = 4 --> 3,176,313 entries As far as I know, the filtering script simply checks all the possible adjacent n-grams in the input sentences (with maximun n = 10) and extracts from the phrase table only the phrases matching these ngrams. Does it do anything else? Thanks, Lucia _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
