Hi Jian This is a bit suspect:
2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137 Are the scores in this file sensible (or are they all the same?) /home/mml/mml-test/experiment/training/corpus-mml-score.1 cheers - Barry On 24/01/14 14:53, jian zhang wrote: > Hi, > > I got error of IndexError: list index out of range at > the TRAINING_mml-filter-before-wa step. > > I had read the post at > https://www.mail-archive.com/[email protected]/msg08767.html, > however I still can not figure out what is wrong. > > The full error is > > general:strategy = Score > general:source_language = fr > general:target_language = en > general:input_stem = /home/mml/mml-test/experiment/training/corpus.1 > general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1 > general:domain_file = /home/mml/mml-test/experiment/model/domains.1 > general:domain_file_out = > /home/mml/mml-test/experiment/training/corpus-mml.1 > score:score_file = > /home/mml/mml-test/experiment/training/corpus-mml-score.1 > score:proportion = 0.9 > > 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137 > Traceback (most recent call last): > File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", > line 156, in <module> > main() > File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", > line 111, in main > strategy = strategy_class(config) > File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", > line 72, in __init__ > [float(line[:-1]) for line in open(self.score_file)], > reverse=True)[ignore_count + count] > IndexError: list index out of range > > And my ems configuration file has: > > ################################################################# > # PARALLEL CORPUS PREPARATION: > # create a tokenized, sentence-aligned corpus, ready for training > > [CORPUS] > > #in-domain parallel corpus > [CORPUS:in] > clean-stem = $training-in-domain-corpus > > [CORPUS:out] > #out-domain parallel corpus > clean-stem = $training-out-domain-corpus > > > ################################################################# > # LANGUAGE MODEL TRAINING > [LM] > [LM:lm] > type = 8 > lm = $language-model > ################################################################# > # MODIFIED MOORE LEWIS FILTERING > > [MML] > > lm-training = $srilm-dir/ngram-count > lm-settings = "-interpolate -kndiscount -unk" > lm-binarizer = $moses-src-dir/bin/build_binary > lm-query = $moses-src-dir/bin/query > order = 5 > > ### in-/out-of-domain source/target corpora to train the 4 language model > # > # in-domain parallel corpus > indomain-stem = [CORPUS:in:clean-split-stem] > > # out-of-domain parallel corpus > outdomain-stem = [CORPUS:out:clean-split-stem] > > # settings: number of lines sampled from the corpora to train each > language model on > settings = "--line-count 100000" > > ################################################################# > # TRANSLATION MODEL TRAINING > [TRAINING] > script = $moses-script-dir/training/train-model.perl > training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G > -sort-compress gzip -sort-parallel 12 -cores 12" > parallel = yes > alignment-symmetrization-method = grow-diag-final-and > lexicalized-reordering = msd-bidirectional-fe > score-settings = "--GoodTuring" > include-word-alignment-in-rules = yes > > #space separated all out-of domain corpora to be filtered > mml-filter-corpora = out > mml-before-wa = "-proportion 0.9" > > ##################################################### > > Thanks. > > > Jian Zhang > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
