Hi, as you may have notices, there is a lot of activity on this right now, and the web page is not accurate in this respect.
There are two ways to report sparse features: By default, a single aggregate weighted feature score for all features of a feature functions is reported, but if you use the switch -report-sparse-features FEATURE_NAME1 FEATURE_NAME2 ... then individual features are reported in the n-best list. -phi On Wed, Sep 7, 2011 at 10:54 AM, Anne Schuth <[email protected]> wrote: > Hi Philipp, Barry, > > Thanks for those pointers, that was helpful. I found the page > http://www.statmt.org/moses/?n=Moses.SparseFeatureFunctions.There it says: > "sparse features are not reported in n-best lists and search graphs". Isn't > that a problem? Or is that no longer the case? Indeed, I want to do > optimization with PRO, MIRA and my own optimizer (more on that hopefully > soon!). > > Best, > Anne > > -- > Anne Schuth > ILPS - ISLA - FNWI > University of Amsterdam > Science Park 904, C3.230 > 1098 XH AMSTERDAM > The Netherlands > 0031 (0) 20 525 5357 > > > > On Wed, Sep 7, 2011 at 11:47, Barry Haddow <[email protected]> > wrote: >> >> Hi Anne >> >> There's not much explanation available of how the sparse features work, >> but implementing a new one should be fairly straightforward. They're just >> like normal features, except that the GetNumScoreComponents() returns the >> special value 'unlimited'. There's a few examples in there, such as >> TargetBigramFeature. >> >> You can optimise the feature weights using PRO (Hopkins & May, EMNLP 2011) >> or MIRA (Hasler et al, MTM 2011). Optimisation with samplerank (Haddow et >> al, WMT 2011) is also possible, although the code exists in a different >> moses branch (samplerank in svn), and you have to write a wrapper for the >> feature function >> >> cheers - Barry >> >> Quoting Anne Schuth <[email protected]> on Wed, 7 Sep 2011 11:23:14 >> +0200: >> >>> Thank you Barry, Philipp, >>> >>> The responsiveness of this list remains impressive! >>> >>> I will take a look at the miramerge branch. Is there anywere I can read >>> up >>> on what happened in that branch (beside, of course, the code)? >>> >>> Best, >>> Anne >>> >>> -- >>> Anne Schuth >>> ILPS - ISLA - FNWI >>> University of Amsterdam >>> Science Park 904, C3.230 >>> 1098 XH AMSTERDAM >>> The Netherlands >>> 0031 (0) 20 525 5357 >>> >>> >>> >>> On Wed, Sep 7, 2011 at 09:49, Philipp Koehn <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> a much better solution is the use of sparse >>>> feature functions that compute the feature values >>>> on the fly and store them efficiently in the decoder. >>>> >>>> We created already some such sparse feature function >>>> in the MIRA branch of the decoder. I am currently not >>>> sure about in which repository a version of this could >>>> be found - maybe Barry Haddow or Eva Hasler have >>>> a better answer. >>>> >>>> -phi >>>> >>>> On Wed, Sep 7, 2011 at 8:34 AM, Anne Schuth <[email protected]> >>>> wrote: >>>> > Hi all, >>>> > >>>> > We are in the process of reimplementing some of the 11,001 new >>>> > features >>>> of >>>> > the Chiang et al. 2009 paper. We are adding a few thousand features to >>>> our >>>> > phrase table, causing it to blow up significantly. For tuning purposes >>>> > we >>>> > filter the table to only include phrases used by our tuning dataset >>>> > which >>>> > brings the size on disk down to about 200MB (gzipped). However, as >>>> > soon >>>> as >>>> > we load this table into memory with Moses, it takes more than 60GB. >>>> > This >>>> is >>>> > not really a surprise I guess since Moses will represent all our 0's >>>> > as >>>> > floating points, but it is a problem since not all machines I would >>>> > like >>>> to >>>> > run this on have that much memory. >>>> > This leads to my question: does Moses support some form of sparse >>>> > representation of phrase tables? Or, how is this issue generally >>>> > solved, >>>> as >>>> > I am quite sure we are not the first to try this. >>>> > >>>> > Any comments, pointers to documentation are very much appreciated! >>>> > >>>> > Best, >>>> > Anne >>>> > >>>> > -- >>>> > Anne Schuth >>>> > ILPS - ISLA - FNWI >>>> > University of Amsterdam >>>> > Science Park 904, C3.230 >>>> > 1098 XH AMSTERDAM >>>> > The Netherlands >>>> > 0031 (0) 20 525 5357 >>>> > >>>> > >>>> > _______________________________________________ >>>> > Moses-support mailing list >>>> > [email protected] >>>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > >>>> > >>>> >>> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
