Hi, the filter script filters an existing phrase table. With the EMS settings, it would build another phrase table.
Don't worry about the reordering table. It will have excess entries, but they will not be used. If you really care, you can used the script scripts/training/remove-orphan-phrase-pairs-from-reordering-table.perl -phi On Mon, Aug 31, 2015 at 10:50 AM, Vincent Nguyen <[email protected]> wrote: > > thanks, will try and post results. > just to be clear: > I can re-use the previous extract file > I have to rebuild the phrase-table with new min score (ie no way to just > filter the previous one ?) > do I have to rebuild the reordering table too ? > > Vincent > > > Le 31/08/2015 16:44, Philipp Koehn a écrit : > > hI, > > 0.0001 should have no impact on translation quality, > 0.001 will have some impact > 0.01 is probably a bit too drastic. > > But that's the range you should explore. > > -phi > > On Mon, Aug 31, 2015 at 10:33 AM, Vincent Nguyen <[email protected]> wrote: > >> is there any benchmark on what value / what impact ? >> what should I start with as a test 0.001 ? >> >> the standard value 0.0001 seems really really low to me .... >> maybe I am not getting what this probability exactly refers to. >> >> >> >> where FIELDn is the position of the score (typically 2 for the direct >> phrase probability p(e|f), or 0 for the indirect phrase probability p(f|e)) >> and THRESHOLD the maximum probability allowed. A good setting is 2:0.0001, >> which removes all rules, where the direct phrase translation probability is >> below 0.0001. >> >> >> >> Le 31/08/2015 16:14, Philipp Koehn a écrit : >> >> Hi, >> >> I would suspect that with beam sizes <500 the bulk of the time is >> spent on translation option collection, not decoding. You could speed >> that up with tighter threshold pruning of the phrase table. >> >> See the script scripts/training/threshold-filter.perl or the setting >> score-settings = "--MinScore 2:0.0001" >> in EMS. >> >> -phi >> >> On Mon, Aug 31, 2015 at 3:03 AM, Vincent Nguyen <[email protected]> wrote: >> >>> Hi, >>> >>> Here are some results with several values with cube pruning pop limit : >>> >>> (pop limit / decoding time for 3000 sentences / BLEU score) >>> >>> 5000 - 15m45 - 29.59 >>> 1000 - 4m27 - 29.59 >>> 500 - 3m35 - 29.59 >>> 200 - 3m15 - 29.51 >>> 100 - 3m00 - 29.40 >>> >>> Therefore I took 400 - 3m19 - 29.58 >>> >>> If I am not mistaken the default value for Moses is 1000 [read in the >>> doc] but in the EMS >>> it is 5000 right now .... which makes the experience so long ..... >>> I suggest to change the EMS default value. >>> >>> Is there a way to also use a cube pruning limit in the decoder at Tuning >>> time ? >>> >>> Now with this optimized setting I get a ration of 15 segments per second >>> in average. >>> What is the reason for online tools like Google / Bing to be much much >>> faster. >>> it's not a machine issue, is it ? >>> >>> >>> Cheers >>> Vincent >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
