Hi Per,
during phrase extraction it's hard to estimate which phrase pairs should 
be pruned and which not. The fact that a particular phrase pair seems to 
be of a low quality becomes apparent only after its statistics (or its 
significancy in case of sigfilter) are estimated from and compared 
within the *entire* extracted set.

I'm working on alternative way of phrase extraction (see 
http://ufal.mff.cuni.cz/pbml/96/art-przywara-bojar.pdf) that filters 
phrase pairs on-the-fly, but this approach requires quite a lot of 
memory to produce phrase tables that retains the overall quality of 
translation model. But in case of interest, I can give you more 
information and provide with current version of the tool (contrib/eppex 
is outdated in the moment).

Cheers,
Česlav

on 9.4.2013 09:57 Per Tunedal said the following:
> Hi,
> Finally, I've succeeded to prune the phrase-table of my baseline
> phrase-model. With a size of 6 % of the original the translation has
> actually improved!
> Now the phrase-table fits in memory and the translation is fast, or at
> least acceptable.
>
> Now, I'm asking: if all those phrase-pairs that where pruned away,
> aren't necessary: why are they created in the first place? Obviously,
> there is a potential making the training more effective, isn't it?
>
> Yours,
> Per Tunedal
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to