Hi, I have not a clear picture what phrase table implementations support sparse features. Until recently PhraseTableBin did, but PhraseTableCompact did not. Not sure, if things changed either way.
-phi On Thu, Jul 16, 2015 at 9:42 PM, Matthias Huck <[email protected]> wrote: > Hi, > > You're right, I claimed in the previous mail that "in order to produce > sparse features, you need to write a feature function anyway" and this > is of course not true if you get the sparse phrase table features to > work. > > When I tried those sparse domain indicators recently, they didn't work > out of the box, and I also don't know where to find the relevant code. > My guess is that this functionality was broken during the course of > Moses refactoring, but it may as well still be there and waiting to be > activated in the moses.ini. What I did was just switching to dense > domain indicators. > > Maybe Hieu can help? > > Cheers, > Matthias > > > On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote: >> Hi Matthias, >> >> >> Thanks for the information. >> >> >> I tested on moses 3.0, adding phrase table sparse feature is seems >> working. >> >> >> However, I did not add any flag into ini, like suggested "If a phrase >> table contains sparse features, then this needs to be flagged in the >> configuration file by adding the word sparse after the phrase table >> file name.". Did i miss anything? >> >> >> Regards, >> >> >> Jian >> >> >> >> >> >> >> >> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <[email protected]> >> wrote: >> Hi Jian, >> >> That depends on the nature of the features you're planning to >> implement. >> >> In order to produce sparse features, you need to write a >> feature >> function anyway. >> >> But if it's only a handful of scores and they can be >> calculated during >> extraction time, then go for dense features and add the scores >> directly >> to the phrase table. >> >> If the scores cannot be precalculated, for instance because >> you need >> non-local information that is only available during decoding, >> then a >> feature function implementation becomes necessary. >> >> When you write a feature function that calculates scores >> during decoding >> time, it can produce dense scores, sparse scores, or both >> types. That's >> up to you. >> >> If it's plenty of scores which are fired rarely, then sparse >> is the >> right choice. And you certainly need a sparse feature function >> implementation in case you are not aware in advance of the >> overall >> amount of feature scores it can produce. >> >> If you need information from phrase extraction in order to >> calculate >> scores during decoding time, then we have something denoted as >> "phrase >> properties". Phrase properties give you a means of storing >> arbitrary >> additional information in the phrase table. You have to extend >> the >> extraction pipeline to retrieve and store the phrase >> properties you >> require. The decoder can later read this information from the >> phrase >> table, and your feature function can utilize it in some way. >> >> A large amount of sparse feature scores can somewhat slow down >> decoding >> and tuning. Also, you have to use MIRA or PRO for tuning, not >> MERT. >> >> Cheers, >> Matthias >> >> >> On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote: >> > Hi Matthias, >> > >> > >> > Not for domain feature. >> > >> > >> > I want to implement some sparse features, so there are two >> options: >> > 1, add to phrase table, if it is supported >> > 2, implement sparse feature functions, >> > >> > >> > I'd like to know are there any difference between these two >> options, >> > for example, tuning, compute sentence translation scores ... >> > >> > >> > Regards, >> > >> > >> > >> > Jian >> > >> > >> > >> > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck >> <[email protected]> >> > wrote: >> > Hi, >> > >> > Are you planning to use binary domain indicator >> features? I'm >> > not sure >> > whether a sparse feature function for this is >> currently >> > implemented. If >> > you're working with a small set of domains, you can >> employ >> > dense >> > indicators instead (domain-features = "indicator" in >> EMS). >> > You'll have >> > to re-extract the phrase table, though. Or process >> it with a >> > script to >> > add dense indicator values to the scores field. >> > >> > I believe that there might also be some bug in the >> extraction >> > pipeline >> > when both domain-features = "sparse indicator" and >> > score-settings = >> > "--GoodTuring" are active in EMS. At least it caused >> me >> > trouble a couple >> > of weeks ago. However, I must admit that I didn't >> investigate >> > it further >> > at that point. >> > >> > Anyway, the bottom line is that I recommend >> re-extracting with >> > dense >> > indicators. >> > >> > But let me know what you find regarding a sparse >> > implementation. >> > >> > Cheers, >> > Matthias >> > >> > >> > On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote: >> > > Hi, >> > > >> > > >> > > Is the sparse features at phrase table, like >> > > >> > > >> > > >> > > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 >> ||| 0-0 1-1 >> > ||| 5000 >> > > 5000 2500 ||| dom_europarl 1 >> > > >> > > >> > > >> > > still supported? If yes, what should I set to the >> ini file >> > based on >> > > the example above? >> > > >> > > >> > > Thank, >> > > >> > > >> > > Jian >> > > >> > > >> > > -- >> > > Jian Zhang >> > > Centre for Next Generation Localisation (CNGL) >> > > Dublin City University >> > >> > > _______________________________________________ >> > > Moses-support mailing list >> > > [email protected] >> > > >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> > >> > -- >> > The University of Edinburgh is a charitable body, >> registered >> > in >> > Scotland, with registration number SC005336. >> > >> > >> > >> > >> > >> > -- >> > Jian Zhang >> > Centre for Next Generation Localisation (CNGL) >> > Dublin City University >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered >> in >> Scotland, with registration number SC005336. >> >> >> >> >> >> >> -- >> Jian Zhang >> Centre for Next Generation Localisation (CNGL) >> Dublin City University > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
