Hi,

I got error of IndexError: list index out of range at
the TRAINING_mml-filter-before-wa step.

I had read the post at
https://www.mail-archive.com/[email protected]/msg08767.html, however I
still can not figure out what is wrong.

The full error is

general:strategy = Score
general:source_language = fr
general:target_language = en
general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1
general:domain_file = /home/mml/mml-test/experiment/model/domains.1
general:domain_file_out =
/home/mml/mml-test/experiment/training/corpus-mml.1
score:score_file = /home/mml/mml-test/experiment/training/corpus-mml-score.1
score:proportion = 0.9

2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
Traceback (most recent call last):
  File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
156, in <module>
    main()
  File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
111, in main
    strategy = strategy_class(config)
  File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
72, in __init__
    [float(line[:-1]) for line in open(self.score_file)],
reverse=True)[ignore_count + count]
IndexError: list index out of range

And my ems configuration file has:

#################################################################
# PARALLEL CORPUS PREPARATION:
# create a tokenized, sentence-aligned corpus, ready for training

[CORPUS]

#in-domain parallel corpus
[CORPUS:in]
clean-stem = $training-in-domain-corpus

[CORPUS:out]
#out-domain parallel corpus
clean-stem = $training-out-domain-corpus


#################################################################
# LANGUAGE MODEL TRAINING
[LM]
[LM:lm]
type = 8
lm = $language-model
#################################################################
# MODIFIED MOORE LEWIS FILTERING

[MML]

lm-training = $srilm-dir/ngram-count
lm-settings = "-interpolate -kndiscount -unk"
lm-binarizer = $moses-src-dir/bin/build_binary
lm-query = $moses-src-dir/bin/query
order = 5

### in-/out-of-domain source/target corpora to train the 4 language model
#
# in-domain parallel corpus
indomain-stem = [CORPUS:in:clean-split-stem]

# out-of-domain parallel corpus
outdomain-stem = [CORPUS:out:clean-split-stem]

# settings: number of lines sampled from the corpora to train each language
model on
settings = "--line-count 100000"

#################################################################
# TRANSLATION MODEL TRAINING
[TRAINING]
script = $moses-script-dir/training/train-model.perl
training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G
-sort-compress gzip -sort-parallel 12 -cores 12"
parallel = yes
alignment-symmetrization-method = grow-diag-final-and
lexicalized-reordering = msd-bidirectional-fe
score-settings = "--GoodTuring"
include-word-alignment-in-rules = yes

#space separated all out-of domain corpora to be filtered
mml-filter-corpora = out
mml-before-wa = "-proportion 0.9"

#####################################################

Thanks.


Jian Zhang
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to