Hi Francis, thank you for your question - the documentation in this respect has not caught up yet.
Th target word insertion feature is a sparse feature, so it behaves a bit different from the features that you are used to. What it is intended to do is to learn lexical features, one feature for each word, which indicates if inserting the word in the output is a good thing or not. The indication if a word has been inserted is detected from the word alignment within each phrase pair. If the target word is not aligned to any source word, then it is deemed to be inserted. The options you have to specify is a factor number (typically 0 for the surface form of the word), and optionally a file that contains a restricted list of words. If such a file is present, then only target words that are in the file (one word per line) are considered for the feature. In other words: words that are not in the file may be inserted or not, no feature calculation takes place. Sparse lexical features require a special weight file that contains the weight for each instantiation of a feature. So this may look like the following: twi_I -0.00529196301346302 twi_had -4.16585913937328e-05 twi_was -0.00612071371830685 [...] Of course, you want to learn these feature weights during tuning, which requires the use of either PRO or kbMIRA - it does not work with plain MERT. The moses.ini that is used to run tuning must contain: [report-sparse-features] twi in addition to the [target-word-insertion-feature] 0 /path/to/word/list Let me know if this description helps you. -phi On Fri, Apr 5, 2013 at 2:23 PM, Francis Tyers <[email protected]> wrote: > Hello everyone! > > I'm a bit interested in the -target-word-insertion-feature to Moses. The > help output is as follows: > > -target-word-insertion-feature: Count feature for each unaligned target > word > > I tried calling it without any options and it didn't seem to do > anything, so I checked out the code and found a couple of hints: > > 1) in build-sparse-lexical-features.perl: > > [target-word-insertion-feature] > 0 $file > > 2) in moses/StaticData.cpp: > > UserMessage::Add("Format of target word insertion feature parameter is: > --target-word-insertion-feature <factor> [filename]"); > > So, this would suggest that it requires a factor, and a filename is > optional. The code instantiates a class TargetWordInsertionFeature. > > If we look at the TargetWordInsertionFeature, it seems to: > > * Load a file with a list of words if it exists > * Make a boolean array of size 16 (I guess this is because of the limit > on feature score length in ScoreComponentCollection) > * For each word in the phrase it sets if it is aligned or not > * If the word is unaligned it adds 1 to the score for that word > feature.(?) > > ... this is where I get lost. > > Can anyone give a better description of what this option does, and how > it effects the translation (if at all). > > My initial interest was in getting statistics on unaligned words that > appeared in the output. Can this option give that ? > > Thanks in advance for any help! > > Fran > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
