the OnDisk pt can do everything - sparse features, properties, hiero models. it's just slow and big
i think the old Binary pt did sparse features but not properties, the Compact pt does neither Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu On 17 July 2015 at 06:25, Philipp Koehn <[email protected]> wrote: > Hi, > > I have not a clear picture what phrase table implementations support > sparse features. Until recently PhraseTableBin did, but PhraseTableCompact > did not. Not sure, if things changed either way. > > -phi > > On Thu, Jul 16, 2015 at 9:42 PM, Matthias Huck <[email protected]> wrote: > > Hi, > > > > You're right, I claimed in the previous mail that "in order to produce > > sparse features, you need to write a feature function anyway" and this > > is of course not true if you get the sparse phrase table features to > > work. > > > > When I tried those sparse domain indicators recently, they didn't work > > out of the box, and I also don't know where to find the relevant code. > > My guess is that this functionality was broken during the course of > > Moses refactoring, but it may as well still be there and waiting to be > > activated in the moses.ini. What I did was just switching to dense > > domain indicators. > > > > Maybe Hieu can help? > > > > Cheers, > > Matthias > > > > > > On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote: > >> Hi Matthias, > >> > >> > >> Thanks for the information. > >> > >> > >> I tested on moses 3.0, adding phrase table sparse feature is seems > >> working. > >> > >> > >> However, I did not add any flag into ini, like suggested "If a phrase > >> table contains sparse features, then this needs to be flagged in the > >> configuration file by adding the word sparse after the phrase table > >> file name.". Did i miss anything? > >> > >> > >> Regards, > >> > >> > >> Jian > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <[email protected]> > >> wrote: > >> Hi Jian, > >> > >> That depends on the nature of the features you're planning to > >> implement. > >> > >> In order to produce sparse features, you need to write a > >> feature > >> function anyway. > >> > >> But if it's only a handful of scores and they can be > >> calculated during > >> extraction time, then go for dense features and add the scores > >> directly > >> to the phrase table. > >> > >> If the scores cannot be precalculated, for instance because > >> you need > >> non-local information that is only available during decoding, > >> then a > >> feature function implementation becomes necessary. > >> > >> When you write a feature function that calculates scores > >> during decoding > >> time, it can produce dense scores, sparse scores, or both > >> types. That's > >> up to you. > >> > >> If it's plenty of scores which are fired rarely, then sparse > >> is the > >> right choice. And you certainly need a sparse feature function > >> implementation in case you are not aware in advance of the > >> overall > >> amount of feature scores it can produce. > >> > >> If you need information from phrase extraction in order to > >> calculate > >> scores during decoding time, then we have something denoted as > >> "phrase > >> properties". Phrase properties give you a means of storing > >> arbitrary > >> additional information in the phrase table. You have to extend > >> the > >> extraction pipeline to retrieve and store the phrase > >> properties you > >> require. The decoder can later read this information from the > >> phrase > >> table, and your feature function can utilize it in some way. > >> > >> A large amount of sparse feature scores can somewhat slow down > >> decoding > >> and tuning. Also, you have to use MIRA or PRO for tuning, not > >> MERT. > >> > >> Cheers, > >> Matthias > >> > >> > >> On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote: > >> > Hi Matthias, > >> > > >> > > >> > Not for domain feature. > >> > > >> > > >> > I want to implement some sparse features, so there are two > >> options: > >> > 1, add to phrase table, if it is supported > >> > 2, implement sparse feature functions, > >> > > >> > > >> > I'd like to know are there any difference between these two > >> options, > >> > for example, tuning, compute sentence translation scores ... > >> > > >> > > >> > Regards, > >> > > >> > > >> > > >> > Jian > >> > > >> > > >> > > >> > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck > >> <[email protected]> > >> > wrote: > >> > Hi, > >> > > >> > Are you planning to use binary domain indicator > >> features? I'm > >> > not sure > >> > whether a sparse feature function for this is > >> currently > >> > implemented. If > >> > you're working with a small set of domains, you can > >> employ > >> > dense > >> > indicators instead (domain-features = "indicator" in > >> EMS). > >> > You'll have > >> > to re-extract the phrase table, though. Or process > >> it with a > >> > script to > >> > add dense indicator values to the scores field. > >> > > >> > I believe that there might also be some bug in the > >> extraction > >> > pipeline > >> > when both domain-features = "sparse indicator" and > >> > score-settings = > >> > "--GoodTuring" are active in EMS. At least it caused > >> me > >> > trouble a couple > >> > of weeks ago. However, I must admit that I didn't > >> investigate > >> > it further > >> > at that point. > >> > > >> > Anyway, the bottom line is that I recommend > >> re-extracting with > >> > dense > >> > indicators. > >> > > >> > But let me know what you find regarding a sparse > >> > implementation. > >> > > >> > Cheers, > >> > Matthias > >> > > >> > > >> > On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote: > >> > > Hi, > >> > > > >> > > > >> > > Is the sparse features at phrase table, like > >> > > > >> > > > >> > > > >> > > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 > >> ||| 0-0 1-1 > >> > ||| 5000 > >> > > 5000 2500 ||| dom_europarl 1 > >> > > > >> > > > >> > > > >> > > still supported? If yes, what should I set to the > >> ini file > >> > based on > >> > > the example above? > >> > > > >> > > > >> > > Thank, > >> > > > >> > > > >> > > Jian > >> > > > >> > > > >> > > -- > >> > > Jian Zhang > >> > > Centre for Next Generation Localisation (CNGL) > >> > > Dublin City University > >> > > >> > > _______________________________________________ > >> > > Moses-support mailing list > >> > > [email protected] > >> > > > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > >> > > >> > > >> > -- > >> > The University of Edinburgh is a charitable body, > >> registered > >> > in > >> > Scotland, with registration number SC005336. > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > Jian Zhang > >> > Centre for Next Generation Localisation (CNGL) > >> > Dublin City University > >> > >> > >> > >> -- > >> The University of Edinburgh is a charitable body, registered > >> in > >> Scotland, with registration number SC005336. > >> > >> > >> > >> > >> > >> > >> -- > >> Jian Zhang > >> Centre for Next Generation Localisation (CNGL) > >> Dublin City University > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
