Hi Jian

This is a bit suspect:

2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137

Are the scores in this file sensible (or are they all the same?)

/home/mml/mml-test/experiment/training/corpus-mml-score.1

cheers - Barry

On 24/01/14 14:53, jian zhang wrote:
> Hi,
>
> I got error of IndexError: list index out of range at 
> the TRAINING_mml-filter-before-wa step.
>
> I had read the post at 
> https://www.mail-archive.com/[email protected]/msg08767.html, 
> however I still can not figure out what is wrong.
>
> The full error is
>
> general:strategy = Score
> general:source_language = fr
> general:target_language = en
> general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
> general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1
> general:domain_file = /home/mml/mml-test/experiment/model/domains.1
> general:domain_file_out = 
> /home/mml/mml-test/experiment/training/corpus-mml.1
> score:score_file = 
> /home/mml/mml-test/experiment/training/corpus-mml-score.1
> score:proportion = 0.9
>
> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
> Traceback (most recent call last):
>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", 
> line 156, in <module>
>     main()
>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", 
> line 111, in main
>     strategy = strategy_class(config)
>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", 
> line 72, in __init__
>     [float(line[:-1]) for line in open(self.score_file)], 
> reverse=True)[ignore_count + count]
> IndexError: list index out of range
>
> And my ems configuration file has:
>
> #################################################################
> # PARALLEL CORPUS PREPARATION:
> # create a tokenized, sentence-aligned corpus, ready for training
>
> [CORPUS]
>
> #in-domain parallel corpus
> [CORPUS:in]
> clean-stem = $training-in-domain-corpus
>
> [CORPUS:out]
> #out-domain parallel corpus
> clean-stem = $training-out-domain-corpus
>
>
> #################################################################
> # LANGUAGE MODEL TRAINING
> [LM]
> [LM:lm]
> type = 8
> lm = $language-model
> #################################################################
> # MODIFIED MOORE LEWIS FILTERING
>
> [MML]
>
> lm-training = $srilm-dir/ngram-count
> lm-settings = "-interpolate -kndiscount -unk"
> lm-binarizer = $moses-src-dir/bin/build_binary
> lm-query = $moses-src-dir/bin/query
> order = 5
>
> ### in-/out-of-domain source/target corpora to train the 4 language model
> #
> # in-domain parallel corpus
> indomain-stem = [CORPUS:in:clean-split-stem]
>
> # out-of-domain parallel corpus
> outdomain-stem = [CORPUS:out:clean-split-stem]
>
> # settings: number of lines sampled from the corpora to train each 
> language model on
> settings = "--line-count 100000"
>
> #################################################################
> # TRANSLATION MODEL TRAINING
> [TRAINING]
> script = $moses-script-dir/training/train-model.perl
> training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G 
> -sort-compress gzip -sort-parallel 12 -cores 12"
> parallel = yes
> alignment-symmetrization-method = grow-diag-final-and
> lexicalized-reordering = msd-bidirectional-fe
> score-settings = "--GoodTuring"
> include-word-alignment-in-rules = yes
>
> #space separated all out-of domain corpora to be filtered
> mml-filter-corpora = out
> mml-before-wa = "-proportion 0.9"
>
> #####################################################
>
> Thanks.
>
>
> Jian Zhang
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to