Re: [Moses-support] Sparse phrase table?

Philipp Koehn Wed, 07 Sep 2011 03:03:59 -0700

Hi,

as you may have notices, there is a lot of activity
on this right now, and the web page is not accurate
in this respect.


There are two ways to report sparse features:
By default, a single aggregate weighted feature score
for all features of a feature functions is reported, but
if you use the switch
  -report-sparse-features FEATURE_NAME1 FEATURE_NAME2 ...
then individual features are reported in the n-best list.

-phi

On Wed, Sep 7, 2011 at 10:54 AM, Anne Schuth <[email protected]> wrote:
> Hi Philipp, Barry,
>
> Thanks for those pointers, that was helpful. I found the page
> http://www.statmt.org/moses/?n=Moses.SparseFeatureFunctions.There it says:
> "sparse features are not reported in n-best lists and search graphs". Isn't
> that a problem? Or is that no longer the case? Indeed, I want to do
> optimization with PRO, MIRA and my own optimizer (more on that hopefully
> soon!).
>
> Best,
> Anne
>
> --
> Anne Schuth
> ILPS - ISLA - FNWI
> University of Amsterdam
> Science Park 904, C3.230
> 1098 XH AMSTERDAM
> The Netherlands
> 0031 (0) 20 525 5357
>
>
>
> On Wed, Sep 7, 2011 at 11:47, Barry Haddow <[email protected]>
> wrote:
>>
>> Hi Anne
>>
>> There's not much explanation available of how the sparse features work,
>> but implementing a new one should be fairly straightforward. They're just
>> like normal features, except that the GetNumScoreComponents() returns the
>> special value 'unlimited'. There's a few examples in there, such as
>> TargetBigramFeature.
>>
>> You can optimise the feature weights using PRO (Hopkins & May, EMNLP 2011)
>> or MIRA (Hasler et al, MTM 2011). Optimisation with samplerank (Haddow et
>> al, WMT 2011) is also possible, although the code exists in a different
>> moses branch (samplerank in svn), and you have to write a wrapper for the
>> feature function
>>
>> cheers - Barry
>>
>> Quoting Anne Schuth <[email protected]> on Wed, 7 Sep 2011 11:23:14
>> +0200:
>>
>>> Thank you Barry, Philipp,
>>>
>>> The responsiveness of this list remains impressive!
>>>
>>> I will take a look at the miramerge branch. Is there anywere I can read
>>> up
>>> on what happened in that branch (beside, of course, the code)?
>>>
>>> Best,
>>> Anne
>>>
>>> --
>>> Anne Schuth
>>> ILPS - ISLA - FNWI
>>> University of Amsterdam
>>> Science Park 904, C3.230
>>> 1098 XH AMSTERDAM
>>> The Netherlands
>>> 0031 (0) 20 525 5357
>>>
>>>
>>>
>>> On Wed, Sep 7, 2011 at 09:49, Philipp Koehn <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> a much better solution is the use of sparse
>>>> feature functions that compute the feature values
>>>> on the fly and store them efficiently in the decoder.
>>>>
>>>> We created already some such sparse feature function
>>>> in the MIRA branch of the decoder. I am currently not
>>>> sure about in which repository a version of this could
>>>> be found - maybe Barry Haddow or Eva Hasler have
>>>> a better answer.
>>>>
>>>> -phi
>>>>
>>>> On Wed, Sep 7, 2011 at 8:34 AM, Anne Schuth <[email protected]>
>>>> wrote:
>>>> > Hi all,
>>>> >
>>>> > We are in the process of reimplementing some of the 11,001 new
>>>> > features
>>>> of
>>>> > the Chiang et al. 2009 paper. We are adding a few thousand features to
>>>> our
>>>> > phrase table, causing it to blow up significantly. For tuning purposes
>>>> > we
>>>> > filter the table to only include phrases used by our tuning dataset
>>>> > which
>>>> > brings the size on disk down to about 200MB (gzipped). However, as
>>>> > soon
>>>> as
>>>> > we load this table into memory with Moses, it takes more than 60GB.
>>>> > This
>>>> is
>>>> > not really a surprise I guess since Moses will represent all our 0's
>>>> > as
>>>> > floating points, but it is a problem since not all machines I would
>>>> > like
>>>> to
>>>> > run this on have that much memory.
>>>> > This leads to my question: does Moses support some form of sparse
>>>> > representation of phrase tables? Or, how is this issue generally
>>>> > solved,
>>>> as
>>>> > I am quite sure we are not the first to try this.
>>>> >
>>>> > Any comments, pointers to documentation are very much appreciated!
>>>> >
>>>> > Best,
>>>> > Anne
>>>> >
>>>> > --
>>>> > Anne Schuth
>>>> > ILPS - ISLA - FNWI
>>>> > University of Amsterdam
>>>> > Science Park 904, C3.230
>>>> > 1098 XH AMSTERDAM
>>>> > The Netherlands
>>>> > 0031 (0) 20 525 5357
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Moses-support mailing list
>>>> > [email protected]
>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Sparse phrase table?

Reply via email to