Hi Phi --

This is consistent with our experiences for French-to-English translation
on several other data sets we used at the DAMT Hopkins workshop, we also
got a minor degradation pretty consistently. The same held true for the
hierarchical experiments we ran, but there we were using really small data
sets.

The damt_phrase branch contains a multiple core wrapper for sigtest-filter
(like the multiple core extraction wrapper). This is necessary to scale up
to larger data sets than Europarl. It is easy to integrate into
experiment.perl (if someone wants to put this in master it is
straight-forward, and I suppose they could look at the damt_phrase version
of experiment.perl if necessary).

However, be warned, if one core fails, the script currently builds an
incomplete phrase table. (This may also be true for the other multiple core
training scripts, I haven't checked). In the case of sigtest, this can
happen quite easily, because of heavy memory usage.

Cheers, Alex

PS, Phi, thanks for the fix to experiment.perl to deal with directory names
containing "IN", btw!


On Fri, Nov 30, 2012 at 12:37 AM, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> I integrated sigtest filter into experiment.perl and ran some experiments
> with phrase-based models, with GoodTuring count smoothing. Performance in
> terms of BLEU (cased, on newstest2011) decreases generally by a bit with
> the settings I used.
>
> To use this method in experiment.perl, you will have to install Joy
> Zhang's SALM Suffix Array 
> toolkit<http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm#update>
>  and add two settings in the TRAINING section:
>
>  salm-index = /path/to/project/salm/Bin/Linux/Index/IndexSA.O64
>  sigtest-filter = "-l a+e -n 50"
>
> The setting salm-index points to the binary to build the suffix array,
> and sigtest-filter contains the options for filtering (excluding -e, -f,
> -h). EMS detects automatically, if you filter a phrase-based or
> hierarchical model and if a reordering model is used.
>
> The table reports results with different top n translations kept in the
> table in terms of  BLEU score impact and size of gzip text phrase table are
> reported. Filtering of these tables takes about 5 hours clock time.
>  Language Pair baselinen20 n30n50 fr-en30.39   7.9G -.44   1.7G*-.35*
>  1.8G  -.38   1.8Ges-en 30.86   7.1G -.63   1.6G -.55   1.6G *-.46  * 
> 1.6Gcs-en25.53
>   5.2G -.29   1.1G -.17   1.2G  *-.14*   1.2Gen-fr 29.83   7.8G-.28   
> 1.6G-.22   1.6G
>  *-.10*   1.7G en-es32.34   6.9G -.55   1.4G *-.46*   1.5G  -.57   1.5G
> en-cs 17.54   5.2G *-.19*   1.1G -.25   1.1G-.21   1.2G avg- -.397-.333 *
> -.310
>
> *
> I have not done any experiments with hierarchical models yet.
>
> -phi
>
> On Wed, Sep 5, 2012 at 7:53 PM, Rico Sennrich <[email protected]>
> wrote:
> > On Wed, 2012-09-05 at 08:13 -0400, Jonathan Clark wrote:
> >> Rico,
> >>
> >>
> >> Thanks for the response. I've updated the documentation to reflect the
> >> correct directory.
> >>
> >> Do you have any numbers for how this affects the quality of Hiero
> >> systems or what good defaults would be for Hiero?
> >>
> >>
> >> Cheers,
> >> Jon
> >
> > I do have a few numbers for an EN-DE hierarchical system, using about
> > 100 million words of parallel training data and newstest2012 as test
> > set:
> >
> > DE-EN & BLEU & METEOR
> > unfiltered & 20.6 & 28.6
> > filtered & 21.2 & 28.7
> >
> > EN-DE & BLEU & METEOR
> > unfiltered & 14.6 & 35.0
> > filtered & 14.8 & 35.0
> >
> > I used -l a+e -n 20 as pruning threshold, but I don't know if I've
> > picked good hiero options (performance of a phrase-based system is 21.2
> > BLEU for DE-EN, and 15.2 BLEU for EN-DE (both with pruning)).
> >
> > best,
> > Rico
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to