Hi, I got error of IndexError: list index out of range at the TRAINING_mml-filter-before-wa step.
I had read the post at https://www.mail-archive.com/[email protected]/msg08767.html, however I still can not figure out what is wrong. The full error is general:strategy = Score general:source_language = fr general:target_language = en general:input_stem = /home/mml/mml-test/experiment/training/corpus.1 general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1 general:domain_file = /home/mml/mml-test/experiment/model/domains.1 general:domain_file_out = /home/mml/mml-test/experiment/training/corpus-mml.1 score:score_file = /home/mml/mml-test/experiment/training/corpus-mml-score.1 score:proportion = 0.9 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137 Traceback (most recent call last): File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line 156, in <module> main() File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line 111, in main strategy = strategy_class(config) File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line 72, in __init__ [float(line[:-1]) for line in open(self.score_file)], reverse=True)[ignore_count + count] IndexError: list index out of range And my ems configuration file has: ################################################################# # PARALLEL CORPUS PREPARATION: # create a tokenized, sentence-aligned corpus, ready for training [CORPUS] #in-domain parallel corpus [CORPUS:in] clean-stem = $training-in-domain-corpus [CORPUS:out] #out-domain parallel corpus clean-stem = $training-out-domain-corpus ################################################################# # LANGUAGE MODEL TRAINING [LM] [LM:lm] type = 8 lm = $language-model ################################################################# # MODIFIED MOORE LEWIS FILTERING [MML] lm-training = $srilm-dir/ngram-count lm-settings = "-interpolate -kndiscount -unk" lm-binarizer = $moses-src-dir/bin/build_binary lm-query = $moses-src-dir/bin/query order = 5 ### in-/out-of-domain source/target corpora to train the 4 language model # # in-domain parallel corpus indomain-stem = [CORPUS:in:clean-split-stem] # out-of-domain parallel corpus outdomain-stem = [CORPUS:out:clean-split-stem] # settings: number of lines sampled from the corpora to train each language model on settings = "--line-count 100000" ################################################################# # TRANSLATION MODEL TRAINING [TRAINING] script = $moses-script-dir/training/train-model.perl training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G -sort-compress gzip -sort-parallel 12 -cores 12" parallel = yes alignment-symmetrization-method = grow-diag-final-and lexicalized-reordering = msd-bidirectional-fe score-settings = "--GoodTuring" include-word-alignment-in-rules = yes #space separated all out-of domain corpora to be filtered mml-filter-corpora = out mml-before-wa = "-proportion 0.9" ##################################################### Thanks. Jian Zhang
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
