The original binary phrase-table (PhraseDictionaryBinary) has been 
around with us for a long time and it's starting to show it's age and 
getting in the way of further changes to the decoder.

Some of it's shortcomings:
    1. isn't multi-threaded. We get around it by essentially 
instantiating a new instance of it for every thread
    2. Doesn't support translation rule properties, where we can store 
arbitrary information with each rule
    3. ?Doesn't? support sparse features
    4. Can't change the API. The decoder has to keep around a jumble of 
legacy code to support it. (grep for LEGACY, these functions are just 
for the binary phrase-table)
    5. Doesn't support hierarchical/syntax models.
    6. Richard Zens (the original developer) joined the dark side many 
moons ago so no-one really takes care of it anymore.

If people want binary phrase-tables, there's now a glutony of choice.
    1. Marcin's compact phrase-table is pretty awesome - it's fast and 
small.
    2. Nikolay's Probing Pt built on KenLM's datastructures.
    3. Uli's dynamic suffix array
    4. My OnDisk pt. Supports both phrase-based and syntax.

With this in mind, we will deprecate the old binary pt. We can leave it 
in the decoder for a while but get rid of the
    processPhraseTable
so new ones won't be created.

Please raise your voice if you object

-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to