Re: [Moses-support] Sparse phrase table, is still supported?

Hieu Hoang Thu, 16 Jul 2015 22:14:50 -0700

the OnDisk pt can do everything - sparse features, properties, hiero
models. it's just slow and big


i think the old Binary pt did sparse features but not properties, the
Compact pt does neither

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 17 July 2015 at 06:25, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> I have not a clear picture what phrase table implementations support
> sparse features. Until recently PhraseTableBin did, but PhraseTableCompact
> did not. Not sure, if things changed either way.
>
> -phi
>
> On Thu, Jul 16, 2015 at 9:42 PM, Matthias Huck <[email protected]> wrote:
> > Hi,
> >
> > You're right, I claimed in the previous mail that "in order to produce
> > sparse features, you need to write a feature function anyway" and this
> > is of course not true if you get the sparse phrase table features to
> > work.
> >
> > When I tried those sparse domain indicators recently, they didn't work
> > out of the box, and I also don't know where to find the relevant code.
> > My guess is that this functionality was broken during the course of
> > Moses refactoring, but it may as well still be there and waiting to be
> > activated in the moses.ini. What I did was just switching to dense
> > domain indicators.
> >
> > Maybe Hieu can help?
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote:
> >> Hi Matthias,
> >>
> >>
> >> Thanks for the information.
> >>
> >>
> >> I tested on moses 3.0, adding phrase table sparse feature is seems
> >> working.
> >>
> >>
> >> However, I did not add any flag into ini, like suggested "If a phrase
> >> table contains sparse features, then this needs to be flagged in the
> >> configuration file by adding the word sparse after the phrase table
> >> file name.". Did i miss anything?
> >>
> >>
> >> Regards,
> >>
> >>
> >> Jian
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <[email protected]>
> >> wrote:
> >>         Hi Jian,
> >>
> >>         That depends on the nature of the features you're planning to
> >>         implement.
> >>
> >>         In order to produce sparse features, you need to write a
> >>         feature
> >>         function anyway.
> >>
> >>         But if it's only a handful of scores and they can be
> >>         calculated during
> >>         extraction time, then go for dense features and add the scores
> >>         directly
> >>         to the phrase table.
> >>
> >>         If the scores cannot be precalculated, for instance because
> >>         you need
> >>         non-local information that is only available during decoding,
> >>         then a
> >>         feature function implementation becomes necessary.
> >>
> >>         When you write a feature function that calculates scores
> >>         during decoding
> >>         time, it can produce dense scores, sparse scores, or both
> >>         types. That's
> >>         up to you.
> >>
> >>         If it's plenty of scores which are fired rarely, then sparse
> >>         is the
> >>         right choice. And you certainly need a sparse feature function
> >>         implementation in case you are not aware in advance of the
> >>         overall
> >>         amount of feature scores it can produce.
> >>
> >>         If you need information from phrase extraction in order to
> >>         calculate
> >>         scores during decoding time, then we have something denoted as
> >>         "phrase
> >>         properties". Phrase properties give you a means of storing
> >>         arbitrary
> >>         additional information in the phrase table. You have to extend
> >>         the
> >>         extraction pipeline to retrieve and store the phrase
> >>         properties you
> >>         require. The decoder can later read this information from the
> >>         phrase
> >>         table, and your feature function can utilize it in some way.
> >>
> >>         A large amount of sparse feature scores can somewhat slow down
> >>         decoding
> >>         and tuning. Also, you have to use MIRA or PRO for tuning, not
> >>         MERT.
> >>
> >>         Cheers,
> >>         Matthias
> >>
> >>
> >>         On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> >>         > Hi Matthias,
> >>         >
> >>         >
> >>         > Not for domain feature.
> >>         >
> >>         >
> >>         > I want to implement some sparse features, so there are two
> >>         options:
> >>         > 1, add to phrase table, if it is supported
> >>         > 2, implement sparse feature functions,
> >>         >
> >>         >
> >>         > I'd like to know are there any difference between these two
> >>         options,
> >>         > for example, tuning, compute sentence translation scores ...
> >>         >
> >>         >
> >>         > Regards,
> >>         >
> >>         >
> >>         >
> >>         > Jian
> >>         >
> >>         >
> >>         >
> >>         > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck
> >>         <[email protected]>
> >>         > wrote:
> >>         >         Hi,
> >>         >
> >>         >         Are you planning to use binary domain indicator
> >>         features? I'm
> >>         >         not sure
> >>         >         whether a sparse feature function for this is
> >>         currently
> >>         >         implemented. If
> >>         >         you're working with a small set of domains, you can
> >>         employ
> >>         >         dense
> >>         >         indicators instead (domain-features = "indicator" in
> >>         EMS).
> >>         >         You'll have
> >>         >         to re-extract the phrase table, though. Or process
> >>         it with a
> >>         >         script to
> >>         >         add dense indicator values to the scores field.
> >>         >
> >>         >         I believe that there might also be some bug in the
> >>         extraction
> >>         >         pipeline
> >>         >         when both domain-features = "sparse indicator" and
> >>         >         score-settings =
> >>         >         "--GoodTuring" are active in EMS. At least it caused
> >>         me
> >>         >         trouble a couple
> >>         >         of weeks ago. However, I must admit that I didn't
> >>         investigate
> >>         >         it further
> >>         >         at that point.
> >>         >
> >>         >         Anyway, the bottom line is that I recommend
> >>         re-extracting with
> >>         >         dense
> >>         >         indicators.
> >>         >
> >>         >         But let me know what you find regarding a sparse
> >>         >         implementation.
> >>         >
> >>         >         Cheers,
> >>         >         Matthias
> >>         >
> >>         >
> >>         >         On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
> >>         >         > Hi,
> >>         >         >
> >>         >         >
> >>         >         > Is the sparse features at phrase table, like
> >>         >         >
> >>         >         >
> >>         >         >
> >>         >         > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718
> >>         ||| 0-0 1-1
> >>         >         ||| 5000
> >>         >         > 5000 2500 ||| dom_europarl 1
> >>         >         >
> >>         >         >
> >>         >         >
> >>         >         > still supported? If yes, what should I set to the
> >>         ini file
> >>         >         based on
> >>         >         > the example above?
> >>         >         >
> >>         >         >
> >>         >         > Thank,
> >>         >         >
> >>         >         >
> >>         >         > Jian
> >>         >         >
> >>         >         >
> >>         >         > --
> >>         >         > Jian Zhang
> >>         >         > Centre for Next Generation Localisation (CNGL)
> >>         >         > Dublin City University
> >>         >
> >>         >         > _______________________________________________
> >>         >         > Moses-support mailing list
> >>         >         > [email protected]
> >>         >         >
> >>         http://mailman.mit.edu/mailman/listinfo/moses-support
> >>         >
> >>         >
> >>         >
> >>         >         --
> >>         >         The University of Edinburgh is a charitable body,
> >>         registered
> >>         >         in
> >>         >         Scotland, with registration number SC005336.
> >>         >
> >>         >
> >>         >
> >>         >
> >>         >
> >>         > --
> >>         > Jian Zhang
> >>         > Centre for Next Generation Localisation (CNGL)
> >>         > Dublin City University
> >>
> >>
> >>
> >>         --
> >>         The University of Edinburgh is a charitable body, registered
> >>         in
> >>         Scotland, with registration number SC005336.
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jian Zhang
> >> Centre for Next Generation Localisation (CNGL)
> >> Dublin City University
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Sparse phrase table, is still supported?

Reply via email to