Hi,

I finally found the time to fix up and commit my version of
train-factored-model.perl,
which has been out of sync for over a year - or is it two years.

To summarize the major changes:

- you can specify a temp directory for sorting, most likely this will be some
  local disk space. This is done with the switch "-temp-dir"

- more efficient file handling
  * the many multiple parts files are a legacy of bad C compilers when I wrote
    the original version of the script in 2004. All it does now is
filling up the disk
    with many pointless files. This is now avoided by default. If you
really want
    it, there is the switch "-file-limit"
  * several files are now written as gzip files, instead of first generate them
    as regular files and then gzipping them.

- continue
  * if the phrase model training step crashes inbetween, there is now
    a new option "-continue" that allows you to pick up if you have
    sorted extract files, or if you have phrase table halves

- sanity for non-factored training:
  * file names do not have the 0-0 anymore
  * unnecessary duplication of files is avoided

- a couple of technical changes due to the integration with some experimental
  framework code here in Edinburgh:
  * all error messages now say "ERROR" in the output string
  * you can specify

- if you have multiple lexicalized reordering tables, you can now specify
  multiple reordering types for them. Ditto for generation types.

- there is an option "-proper-conditioning" that is intended to fix the problem
  that a particular phrase may be extracted with multiple translations from a
  sentence. By default, all these count as one occurrence, but proper
conditioning
  would require that they have fractional counts that add up to one.
  I tried this option only once, and it lowered the BLEU score, so there you go.
  Maybe buggy.

This is a major update, and syncing it back up was tricky, so there may be some
kinks to be worked out. If you check this out, be sure to also update
phrase-score/extract.cpp
and phrase-score/score.cpp.

-phi
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to