Hi Per, during phrase extraction it's hard to estimate which phrase pairs should be pruned and which not. The fact that a particular phrase pair seems to be of a low quality becomes apparent only after its statistics (or its significancy in case of sigfilter) are estimated from and compared within the *entire* extracted set.
I'm working on alternative way of phrase extraction (see http://ufal.mff.cuni.cz/pbml/96/art-przywara-bojar.pdf) that filters phrase pairs on-the-fly, but this approach requires quite a lot of memory to produce phrase tables that retains the overall quality of translation model. But in case of interest, I can give you more information and provide with current version of the tool (contrib/eppex is outdated in the moment). Cheers, Česlav on 9.4.2013 09:57 Per Tunedal said the following: > Hi, > Finally, I've succeeded to prune the phrase-table of my baseline > phrase-model. With a size of 6 % of the original the translation has > actually improved! > Now the phrase-table fits in memory and the translation is fast, or at > least acceptable. > > Now, I'm asking: if all those phrase-pairs that where pruned away, > aren't necessary: why are they created in the first place? Obviously, > there is a potential making the training more effective, isn't it? > > Yours, > Per Tunedal > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
