Hi,

I have not a clear picture what phrase table implementations support
sparse features. Until recently PhraseTableBin did, but PhraseTableCompact
did not. Not sure, if things changed either way.

-phi

On Thu, Jul 16, 2015 at 9:42 PM, Matthias Huck <[email protected]> wrote:
> Hi,
>
> You're right, I claimed in the previous mail that "in order to produce
> sparse features, you need to write a feature function anyway" and this
> is of course not true if you get the sparse phrase table features to
> work.
>
> When I tried those sparse domain indicators recently, they didn't work
> out of the box, and I also don't know where to find the relevant code.
> My guess is that this functionality was broken during the course of
> Moses refactoring, but it may as well still be there and waiting to be
> activated in the moses.ini. What I did was just switching to dense
> domain indicators.
>
> Maybe Hieu can help?
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote:
>> Hi Matthias,
>>
>>
>> Thanks for the information.
>>
>>
>> I tested on moses 3.0, adding phrase table sparse feature is seems
>> working.
>>
>>
>> However, I did not add any flag into ini, like suggested "If a phrase
>> table contains sparse features, then this needs to be flagged in the
>> configuration file by adding the word sparse after the phrase table
>> file name.". Did i miss anything?
>>
>>
>> Regards,
>>
>>
>> Jian
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <[email protected]>
>> wrote:
>>         Hi Jian,
>>
>>         That depends on the nature of the features you're planning to
>>         implement.
>>
>>         In order to produce sparse features, you need to write a
>>         feature
>>         function anyway.
>>
>>         But if it's only a handful of scores and they can be
>>         calculated during
>>         extraction time, then go for dense features and add the scores
>>         directly
>>         to the phrase table.
>>
>>         If the scores cannot be precalculated, for instance because
>>         you need
>>         non-local information that is only available during decoding,
>>         then a
>>         feature function implementation becomes necessary.
>>
>>         When you write a feature function that calculates scores
>>         during decoding
>>         time, it can produce dense scores, sparse scores, or both
>>         types. That's
>>         up to you.
>>
>>         If it's plenty of scores which are fired rarely, then sparse
>>         is the
>>         right choice. And you certainly need a sparse feature function
>>         implementation in case you are not aware in advance of the
>>         overall
>>         amount of feature scores it can produce.
>>
>>         If you need information from phrase extraction in order to
>>         calculate
>>         scores during decoding time, then we have something denoted as
>>         "phrase
>>         properties". Phrase properties give you a means of storing
>>         arbitrary
>>         additional information in the phrase table. You have to extend
>>         the
>>         extraction pipeline to retrieve and store the phrase
>>         properties you
>>         require. The decoder can later read this information from the
>>         phrase
>>         table, and your feature function can utilize it in some way.
>>
>>         A large amount of sparse feature scores can somewhat slow down
>>         decoding
>>         and tuning. Also, you have to use MIRA or PRO for tuning, not
>>         MERT.
>>
>>         Cheers,
>>         Matthias
>>
>>
>>         On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
>>         > Hi Matthias,
>>         >
>>         >
>>         > Not for domain feature.
>>         >
>>         >
>>         > I want to implement some sparse features, so there are two
>>         options:
>>         > 1, add to phrase table, if it is supported
>>         > 2, implement sparse feature functions,
>>         >
>>         >
>>         > I'd like to know are there any difference between these two
>>         options,
>>         > for example, tuning, compute sentence translation scores ...
>>         >
>>         >
>>         > Regards,
>>         >
>>         >
>>         >
>>         > Jian
>>         >
>>         >
>>         >
>>         > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck
>>         <[email protected]>
>>         > wrote:
>>         >         Hi,
>>         >
>>         >         Are you planning to use binary domain indicator
>>         features? I'm
>>         >         not sure
>>         >         whether a sparse feature function for this is
>>         currently
>>         >         implemented. If
>>         >         you're working with a small set of domains, you can
>>         employ
>>         >         dense
>>         >         indicators instead (domain-features = "indicator" in
>>         EMS).
>>         >         You'll have
>>         >         to re-extract the phrase table, though. Or process
>>         it with a
>>         >         script to
>>         >         add dense indicator values to the scores field.
>>         >
>>         >         I believe that there might also be some bug in the
>>         extraction
>>         >         pipeline
>>         >         when both domain-features = "sparse indicator" and
>>         >         score-settings =
>>         >         "--GoodTuring" are active in EMS. At least it caused
>>         me
>>         >         trouble a couple
>>         >         of weeks ago. However, I must admit that I didn't
>>         investigate
>>         >         it further
>>         >         at that point.
>>         >
>>         >         Anyway, the bottom line is that I recommend
>>         re-extracting with
>>         >         dense
>>         >         indicators.
>>         >
>>         >         But let me know what you find regarding a sparse
>>         >         implementation.
>>         >
>>         >         Cheers,
>>         >         Matthias
>>         >
>>         >
>>         >         On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
>>         >         > Hi,
>>         >         >
>>         >         >
>>         >         > Is the sparse features at phrase table, like
>>         >         >
>>         >         >
>>         >         >
>>         >         > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718
>>         ||| 0-0 1-1
>>         >         ||| 5000
>>         >         > 5000 2500 ||| dom_europarl 1
>>         >         >
>>         >         >
>>         >         >
>>         >         > still supported? If yes, what should I set to the
>>         ini file
>>         >         based on
>>         >         > the example above?
>>         >         >
>>         >         >
>>         >         > Thank,
>>         >         >
>>         >         >
>>         >         > Jian
>>         >         >
>>         >         >
>>         >         > --
>>         >         > Jian Zhang
>>         >         > Centre for Next Generation Localisation (CNGL)
>>         >         > Dublin City University
>>         >
>>         >         > _______________________________________________
>>         >         > Moses-support mailing list
>>         >         > [email protected]
>>         >         >
>>         http://mailman.mit.edu/mailman/listinfo/moses-support
>>         >
>>         >
>>         >
>>         >         --
>>         >         The University of Edinburgh is a charitable body,
>>         registered
>>         >         in
>>         >         Scotland, with registration number SC005336.
>>         >
>>         >
>>         >
>>         >
>>         >
>>         > --
>>         > Jian Zhang
>>         > Centre for Next Generation Localisation (CNGL)
>>         > Dublin City University
>>
>>
>>
>>         --
>>         The University of Edinburgh is a charitable body, registered
>>         in
>>         Scotland, with registration number SC005336.
>>
>>
>>
>>
>>
>>
>> --
>> Jian Zhang
>> Centre for Next Generation Localisation (CNGL)
>> Dublin City University
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to