Hi,

I wrote a little longer documentation of the sparse features:
http://www.statmt.org/moses/?n=Moses.SparseFeatures

-phi

On Fri, Apr 12, 2013 at 6:12 PM, Philipp Koehn <[email protected]> wrote:
> Hi Francis,
>
> thank you for your question - the documentation in this respect has
> not caught up yet.
>
> Th target word insertion feature is a sparse feature, so it behaves a
> bit different from the features that you are used to.
>
> What it is intended to do is to learn lexical features, one feature
> for each word, which indicates if inserting the word in the output is
> a good thing or not. The indication if a word has been inserted is
> detected from the word alignment within each phrase pair. If the
> target word is not aligned to any source word, then it is deemed to be
> inserted.
>
> The options you have to specify is a factor number (typically 0 for
> the surface form of the word), and optionally a file that contains a
> restricted list of words. If such a file is present, then only target
> words that are in the file (one word per line) are considered for the
> feature. In other words: words that are not in the file may be
> inserted or not, no feature calculation takes place.
>
> Sparse lexical features require a special weight file that contains
> the weight for each instantiation of a feature. So this may look like
> the following:
> twi_I -0.00529196301346302
> twi_had -4.16585913937328e-05
> twi_was -0.00612071371830685
> [...]
>
> Of course, you want to learn these feature weights during tuning,
> which requires the use of either PRO or kbMIRA - it does not work with
> plain MERT.
>
> The moses.ini that is used to run tuning must contain:
>
> [report-sparse-features]
> twi
>
> in addition to the
>
> [target-word-insertion-feature]
> 0 /path/to/word/list
>
> Let me know if this description helps you.
>
> -phi
>
> On Fri, Apr 5, 2013 at 2:23 PM, Francis Tyers <[email protected]> wrote:
>> Hello everyone!
>>
>> I'm a bit interested in the -target-word-insertion-feature to Moses. The
>> help output is as follows:
>>
>> -target-word-insertion-feature: Count feature for each unaligned target
>> word
>>
>> I tried calling it without any options and it didn't seem to do
>> anything, so I checked out the code and found a couple of hints:
>>
>> 1) in build-sparse-lexical-features.perl:
>>
>> [target-word-insertion-feature]
>> 0 $file
>>
>> 2) in moses/StaticData.cpp:
>>
>> UserMessage::Add("Format of target word insertion feature parameter is:
>> --target-word-insertion-feature <factor> [filename]");
>>
>> So, this would suggest that it requires a factor, and a filename is
>> optional. The code instantiates a class TargetWordInsertionFeature.
>>
>> If we look at the TargetWordInsertionFeature, it seems to:
>>
>> * Load a file with a list of words if it exists
>> * Make a boolean array of size 16 (I guess this is because of the limit
>> on feature score length in ScoreComponentCollection)
>> * For each word in the phrase it sets if it is aligned or not
>> * If the word is unaligned it adds 1 to the score for that word
>> feature.(?)
>>
>> ... this is where I get lost.
>>
>> Can anyone give a better description of what this option does, and how
>> it effects the translation (if at all).
>>
>> My initial interest was in getting statistics on unaligned words that
>> appeared in the output. Can this option give that ?
>>
>> Thanks in advance for any help!
>>
>> Fran
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to