Hi, I still have to check in some code for the PRO optimization of sparse features. I will likely do this later today (testing now, after merging with other changes).
-phi On Wed, Sep 7, 2011 at 10:47 AM, Barry Haddow <[email protected]> wrote: > Hi Anne > > There's not much explanation available of how the sparse features work, but > implementing a new one should be fairly straightforward. They're just like > normal features, except that the GetNumScoreComponents() returns the special > value 'unlimited'. There's a few examples in there, such as > TargetBigramFeature. > > You can optimise the feature weights using PRO (Hopkins & May, EMNLP 2011) > or MIRA (Hasler et al, MTM 2011). Optimisation with samplerank (Haddow et > al, WMT 2011) is also possible, although the code exists in a different > moses branch (samplerank in svn), and you have to write a wrapper for the > feature function > > cheers - Barry > > Quoting Anne Schuth <[email protected]> on Wed, 7 Sep 2011 11:23:14 > +0200: > >> Thank you Barry, Philipp, >> >> The responsiveness of this list remains impressive! >> >> I will take a look at the miramerge branch. Is there anywere I can read up >> on what happened in that branch (beside, of course, the code)? >> >> Best, >> Anne >> >> -- >> Anne Schuth >> ILPS - ISLA - FNWI >> University of Amsterdam >> Science Park 904, C3.230 >> 1098 XH AMSTERDAM >> The Netherlands >> 0031 (0) 20 525 5357 >> >> >> >> On Wed, Sep 7, 2011 at 09:49, Philipp Koehn <[email protected]> wrote: >> >>> Hi, >>> >>> a much better solution is the use of sparse >>> feature functions that compute the feature values >>> on the fly and store them efficiently in the decoder. >>> >>> We created already some such sparse feature function >>> in the MIRA branch of the decoder. I am currently not >>> sure about in which repository a version of this could >>> be found - maybe Barry Haddow or Eva Hasler have >>> a better answer. >>> >>> -phi >>> >>> On Wed, Sep 7, 2011 at 8:34 AM, Anne Schuth <[email protected]> >>> wrote: >>> > Hi all, >>> > >>> > We are in the process of reimplementing some of the 11,001 new features >>> of >>> > the Chiang et al. 2009 paper. We are adding a few thousand features to >>> our >>> > phrase table, causing it to blow up significantly. For tuning purposes >>> > we >>> > filter the table to only include phrases used by our tuning dataset >>> > which >>> > brings the size on disk down to about 200MB (gzipped). However, as soon >>> as >>> > we load this table into memory with Moses, it takes more than 60GB. >>> > This >>> is >>> > not really a surprise I guess since Moses will represent all our 0's as >>> > floating points, but it is a problem since not all machines I would >>> > like >>> to >>> > run this on have that much memory. >>> > This leads to my question: does Moses support some form of sparse >>> > representation of phrase tables? Or, how is this issue generally >>> > solved, >>> as >>> > I am quite sure we are not the first to try this. >>> > >>> > Any comments, pointers to documentation are very much appreciated! >>> > >>> > Best, >>> > Anne >>> > >>> > -- >>> > Anne Schuth >>> > ILPS - ISLA - FNWI >>> > University of Amsterdam >>> > Science Park 904, C3.230 >>> > 1098 XH AMSTERDAM >>> > The Netherlands >>> > 0031 (0) 20 525 5357 >>> > >>> > >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> > >>> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
