Hi Barry, All the scores are 99999 in that file.
Thanks, Jian On Fri, Jan 24, 2014 at 3:51 PM, Barry Haddow <[email protected]>wrote: > Hi Jian > > This is a bit suspect: > > > 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137 > > Are the scores in this file sensible (or are they all the same?) > > /home/mml/mml-test/experiment/training/corpus-mml-score.1 > > cheers - Barry > > > On 24/01/14 14:53, jian zhang wrote: > >> Hi, >> >> I got error of IndexError: list index out of range at the >> TRAINING_mml-filter-before-wa step. >> >> I had read the post at https://www.mail-archive.com/ >> [email protected]/msg08767.html, however I still can not figure out >> what is wrong. >> >> The full error is >> >> general:strategy = Score >> general:source_language = fr >> general:target_language = en >> general:input_stem = /home/mml/mml-test/experiment/training/corpus.1 >> general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1 >> general:domain_file = /home/mml/mml-test/experiment/model/domains.1 >> general:domain_file_out = /home/mml/mml-test/experiment/ >> training/corpus-mml.1 >> score:score_file = /home/mml/mml-test/experiment/ >> training/corpus-mml-score.1 >> score:proportion = 0.9 >> >> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137 >> Traceback (most recent call last): >> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", >> line 156, in <module> >> main() >> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", >> line 111, in main >> strategy = strategy_class(config) >> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", >> line 72, in __init__ >> [float(line[:-1]) for line in open(self.score_file)], >> reverse=True)[ignore_count + count] >> IndexError: list index out of range >> >> And my ems configuration file has: >> >> ################################################################# >> # PARALLEL CORPUS PREPARATION: >> # create a tokenized, sentence-aligned corpus, ready for training >> >> [CORPUS] >> >> #in-domain parallel corpus >> [CORPUS:in] >> clean-stem = $training-in-domain-corpus >> >> [CORPUS:out] >> #out-domain parallel corpus >> clean-stem = $training-out-domain-corpus >> >> >> ################################################################# >> # LANGUAGE MODEL TRAINING >> [LM] >> [LM:lm] >> type = 8 >> lm = $language-model >> ################################################################# >> # MODIFIED MOORE LEWIS FILTERING >> >> [MML] >> >> lm-training = $srilm-dir/ngram-count >> lm-settings = "-interpolate -kndiscount -unk" >> lm-binarizer = $moses-src-dir/bin/build_binary >> lm-query = $moses-src-dir/bin/query >> order = 5 >> >> ### in-/out-of-domain source/target corpora to train the 4 language model >> # >> # in-domain parallel corpus >> indomain-stem = [CORPUS:in:clean-split-stem] >> >> # out-of-domain parallel corpus >> outdomain-stem = [CORPUS:out:clean-split-stem] >> >> # settings: number of lines sampled from the corpora to train each >> language model on >> settings = "--line-count 100000" >> >> ################################################################# >> # TRANSLATION MODEL TRAINING >> [TRAINING] >> script = $moses-script-dir/training/train-model.perl >> training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G >> -sort-compress gzip -sort-parallel 12 -cores 12" >> parallel = yes >> alignment-symmetrization-method = grow-diag-final-and >> lexicalized-reordering = msd-bidirectional-fe >> score-settings = "--GoodTuring" >> include-word-alignment-in-rules = yes >> >> #space separated all out-of domain corpora to be filtered >> mml-filter-corpora = out >> mml-before-wa = "-proportion 0.9" >> >> ##################################################### >> >> Thanks. >> >> >> Jian Zhang >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -- > Jian Zhang > Centre for Next Generation Localisation (CNGL)<http://www.cngl.ie/index.html> > Dublin City University <http://www.dcu.ie/> > > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
