Hi, I wrote a little longer documentation of the sparse features: http://www.statmt.org/moses/?n=Moses.SparseFeatures
-phi On Fri, Apr 12, 2013 at 6:12 PM, Philipp Koehn <[email protected]> wrote: > Hi Francis, > > thank you for your question - the documentation in this respect has > not caught up yet. > > Th target word insertion feature is a sparse feature, so it behaves a > bit different from the features that you are used to. > > What it is intended to do is to learn lexical features, one feature > for each word, which indicates if inserting the word in the output is > a good thing or not. The indication if a word has been inserted is > detected from the word alignment within each phrase pair. If the > target word is not aligned to any source word, then it is deemed to be > inserted. > > The options you have to specify is a factor number (typically 0 for > the surface form of the word), and optionally a file that contains a > restricted list of words. If such a file is present, then only target > words that are in the file (one word per line) are considered for the > feature. In other words: words that are not in the file may be > inserted or not, no feature calculation takes place. > > Sparse lexical features require a special weight file that contains > the weight for each instantiation of a feature. So this may look like > the following: > twi_I -0.00529196301346302 > twi_had -4.16585913937328e-05 > twi_was -0.00612071371830685 > [...] > > Of course, you want to learn these feature weights during tuning, > which requires the use of either PRO or kbMIRA - it does not work with > plain MERT. > > The moses.ini that is used to run tuning must contain: > > [report-sparse-features] > twi > > in addition to the > > [target-word-insertion-feature] > 0 /path/to/word/list > > Let me know if this description helps you. > > -phi > > On Fri, Apr 5, 2013 at 2:23 PM, Francis Tyers <[email protected]> wrote: >> Hello everyone! >> >> I'm a bit interested in the -target-word-insertion-feature to Moses. The >> help output is as follows: >> >> -target-word-insertion-feature: Count feature for each unaligned target >> word >> >> I tried calling it without any options and it didn't seem to do >> anything, so I checked out the code and found a couple of hints: >> >> 1) in build-sparse-lexical-features.perl: >> >> [target-word-insertion-feature] >> 0 $file >> >> 2) in moses/StaticData.cpp: >> >> UserMessage::Add("Format of target word insertion feature parameter is: >> --target-word-insertion-feature <factor> [filename]"); >> >> So, this would suggest that it requires a factor, and a filename is >> optional. The code instantiates a class TargetWordInsertionFeature. >> >> If we look at the TargetWordInsertionFeature, it seems to: >> >> * Load a file with a list of words if it exists >> * Make a boolean array of size 16 (I guess this is because of the limit >> on feature score length in ScoreComponentCollection) >> * For each word in the phrase it sets if it is aligned or not >> * If the word is unaligned it adds 1 to the score for that word >> feature.(?) >> >> ... this is where I get lost. >> >> Can anyone give a better description of what this option does, and how >> it effects the translation (if at all). >> >> My initial interest was in getting statistics on unaligned words that >> appeared in the output. Can this option give that ? >> >> Thanks in advance for any help! >> >> Fran >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
