Hi Francis,

thank you for your question - the documentation in this respect has
not caught up yet.

Th target word insertion feature is a sparse feature, so it behaves a
bit different from the features that you are used to.

What it is intended to do is to learn lexical features, one feature
for each word, which indicates if inserting the word in the output is
a good thing or not. The indication if a word has been inserted is
detected from the word alignment within each phrase pair. If the
target word is not aligned to any source word, then it is deemed to be
inserted.

The options you have to specify is a factor number (typically 0 for
the surface form of the word), and optionally a file that contains a
restricted list of words. If such a file is present, then only target
words that are in the file (one word per line) are considered for the
feature. In other words: words that are not in the file may be
inserted or not, no feature calculation takes place.

Sparse lexical features require a special weight file that contains
the weight for each instantiation of a feature. So this may look like
the following:
twi_I -0.00529196301346302
twi_had -4.16585913937328e-05
twi_was -0.00612071371830685
[...]

Of course, you want to learn these feature weights during tuning,
which requires the use of either PRO or kbMIRA - it does not work with
plain MERT.

The moses.ini that is used to run tuning must contain:

[report-sparse-features]
twi

in addition to the

[target-word-insertion-feature]
0 /path/to/word/list

Let me know if this description helps you.

-phi

On Fri, Apr 5, 2013 at 2:23 PM, Francis Tyers <[email protected]> wrote:
> Hello everyone!
>
> I'm a bit interested in the -target-word-insertion-feature to Moses. The
> help output is as follows:
>
> -target-word-insertion-feature: Count feature for each unaligned target
> word
>
> I tried calling it without any options and it didn't seem to do
> anything, so I checked out the code and found a couple of hints:
>
> 1) in build-sparse-lexical-features.perl:
>
> [target-word-insertion-feature]
> 0 $file
>
> 2) in moses/StaticData.cpp:
>
> UserMessage::Add("Format of target word insertion feature parameter is:
> --target-word-insertion-feature <factor> [filename]");
>
> So, this would suggest that it requires a factor, and a filename is
> optional. The code instantiates a class TargetWordInsertionFeature.
>
> If we look at the TargetWordInsertionFeature, it seems to:
>
> * Load a file with a list of words if it exists
> * Make a boolean array of size 16 (I guess this is because of the limit
> on feature score length in ScoreComponentCollection)
> * For each word in the phrase it sets if it is aligned or not
> * If the word is unaligned it adds 1 to the score for that word
> feature.(?)
>
> ... this is where I get lost.
>
> Can anyone give a better description of what this option does, and how
> it effects the translation (if at all).
>
> My initial interest was in getting statistics on unaligned words that
> appeared in the output. Can this option give that ?
>
> Thanks in advance for any help!
>
> Fran
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to