Re: [Moses-support] sparse features with batch mira?

Cherry, Colin Thu, 07 Feb 2013 06:48:13 -0800

Hi Alex,

It would be possible to incorporate the change into kbmira, but I'm not eager 
to do so. As you and Berry have mentioned, this makes more sense at the wrapper 
level. There's no need for any tuner to know about the merging and splitting of 
features.


-- Colin

On 2013-02-07, at 9:33 AM, Barry Haddow wrote:

> Hi Alex
> 
> There is already some provision for grouping features, so it should be 
> possible to implement what you need at the wrapper level.
> 
> At the moment, you can train a sparse feature model with mert by 
> omitting the -report-sparse-features flag from Moses, which causes the 
> sparse features to be summed before being written into the n-best list. 
> There is also provision for a hyprid "pro-mert" training, where at each 
> step all features are optimised with pro, then the dense ones are 
> re-optimised with mert,
> 
> cheers - Barry
> 
> On 07/02/13 11:07, Alexander Fraser wrote:
>> Hi Colin,
>> 
>> Yes, I totally agree, grouping the fixed features together is the
>> right way to go. It would ideally go in the wrapper (mert-moses.pl) so
>> it could also be used with line-search-MERT and PRO, but as I recall,
>> it is hard practically to make stuff like that work in there.
>> 
>> How hard would it be to do in kbmira instead?
>> 
>> Cheers, Alex
>> 
>> 
>> On Wed, Feb 6, 2013 at 10:49 PM, Cherry, Colin
>> <[email protected]>  wrote:
>>> Hi Alex,
>>> 
>>> I'm afraid it does not, but I could certainly hack something in.
>>> 
>>> I would be a little nervous about what this would do to MIRA. During MIRA 
>>> training, the scale of the features can change dramatically - I always 
>>> start by normalizing the weight vector to squared norm=1, and by the time 
>>> I'm done a passing through the n-best lists 60 times, the squared norm may 
>>> have gotten much larger. If I keep a feature fixed, it may quickly fall out 
>>> of scale and become irrelevant. Or maybe MIRA will mathmagically work to 
>>> keep the other features in scale. It's not clear to me without checking the 
>>> literature. I think Brian Roark held a single feature fixed in some of his 
>>> perceptron work for speech recognition, so that would be a place to start.
>>> 
>>> Is there an alternative to holding specific weights constant? If there is a 
>>>  group of features to be fixed (say the decoder's dense features), then I 
>>> would suggest presenting their weighted sum to MIRA as a single feature, 
>>> which MIRA can continue to scale appropriately using the meta-feature's 
>>> single weight. After training, the "fixed" features' weights would be the 
>>> product of the single meta-weight and the original fixed weight, which can 
>>> go back in the decoder.
>>> 
>>> I hope that makes sense! I'm willing to add the weight-fixing feature, it's 
>>> easy enough to do, but I thought it would be worth having this conversation 
>>> first.
>>> 
>>> -- Colin
>>> 
>>> On 2013-02-06, at 11:43 AM, Alexander Fraser wrote:
>>> 
>>>> Another batch MIRA question, perhaps for Colin this time: does kbmira
>>>> support only optimizing some feature weights (i.e., holding the other
>>>> weights constant)?
>>>> 
>>>> Cheers, Alex
>>>> 
>>>> 
>>>> On Mon, Feb 4, 2013 at 3:06 PM, Alexander Fraser
>>>> <[email protected]>  wrote:
>>>>> That's great - thanks!
>>>>> 
>>>>> On Mon, Feb 4, 2013 at 2:29 PM, Barry Haddow<[email protected]>  
>>>>> wrote:
>>>>>> Hi Alex
>>>>>> 
>>>>>> Yes, you can use batch mira for training sparse features, it works the 
>>>>>> same
>>>>>> way as PRO does in Moses.
>>>>>> 
>>>>>> Unfortunately documentation on sparse features is, well, sparse... But 
>>>>>> the
>>>>>> n-best format is much the same as for dense features, ie
>>>>>> 
>>>>>> name_1: value_1 name_2: value_2 ...
>>>>>> 
>>>>>> Sparse features only get reported in the nbest if they are named in the
>>>>>> -report-sparse-features argument, otherwise their weighted sum will be
>>>>>> reported.
>>>>>> 
>>>>>> cheers - Barry
>>>>>> 
>>>>>> 
>>>>>> On 04/02/13 13:13, Alexander Fraser wrote:
>>>>>>> Hi Folks,
>>>>>>> 
>>>>>>> Can sparse features be used together with batch mira?
>>>>>>> 
>>>>>>> Is there documentation for the n-best format of sparse features 
>>>>>>> somewhere?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> Cheers, Alex
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] sparse features with batch mira?

Reply via email to