Hi Barry,

All the scores are 99999 in that file.

Thanks,


Jian


On Fri, Jan 24, 2014 at 3:51 PM, Barry Haddow <[email protected]>wrote:

> Hi Jian
>
> This is a bit suspect:
>
>
> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
>
> Are the scores in this file sensible (or are they all the same?)
>
> /home/mml/mml-test/experiment/training/corpus-mml-score.1
>
> cheers - Barry
>
>
> On 24/01/14 14:53, jian zhang wrote:
>
>> Hi,
>>
>> I got error of IndexError: list index out of range at the
>> TRAINING_mml-filter-before-wa step.
>>
>> I had read the post at https://www.mail-archive.com/
>> [email protected]/msg08767.html, however I still can not figure out
>> what is wrong.
>>
>> The full error is
>>
>> general:strategy = Score
>> general:source_language = fr
>> general:target_language = en
>> general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
>> general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1
>> general:domain_file = /home/mml/mml-test/experiment/model/domains.1
>> general:domain_file_out = /home/mml/mml-test/experiment/
>> training/corpus-mml.1
>> score:score_file = /home/mml/mml-test/experiment/
>> training/corpus-mml-score.1
>> score:proportion = 0.9
>>
>> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
>> Traceback (most recent call last):
>>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>> line 156, in <module>
>>     main()
>>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>> line 111, in main
>>     strategy = strategy_class(config)
>>   File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>> line 72, in __init__
>>     [float(line[:-1]) for line in open(self.score_file)],
>> reverse=True)[ignore_count + count]
>> IndexError: list index out of range
>>
>> And my ems configuration file has:
>>
>> #################################################################
>> # PARALLEL CORPUS PREPARATION:
>> # create a tokenized, sentence-aligned corpus, ready for training
>>
>> [CORPUS]
>>
>> #in-domain parallel corpus
>> [CORPUS:in]
>> clean-stem = $training-in-domain-corpus
>>
>> [CORPUS:out]
>> #out-domain parallel corpus
>> clean-stem = $training-out-domain-corpus
>>
>>
>> #################################################################
>> # LANGUAGE MODEL TRAINING
>> [LM]
>> [LM:lm]
>> type = 8
>> lm = $language-model
>> #################################################################
>> # MODIFIED MOORE LEWIS FILTERING
>>
>> [MML]
>>
>> lm-training = $srilm-dir/ngram-count
>> lm-settings = "-interpolate -kndiscount -unk"
>> lm-binarizer = $moses-src-dir/bin/build_binary
>> lm-query = $moses-src-dir/bin/query
>> order = 5
>>
>> ### in-/out-of-domain source/target corpora to train the 4 language model
>> #
>> # in-domain parallel corpus
>> indomain-stem = [CORPUS:in:clean-split-stem]
>>
>> # out-of-domain parallel corpus
>> outdomain-stem = [CORPUS:out:clean-split-stem]
>>
>> # settings: number of lines sampled from the corpora to train each
>> language model on
>> settings = "--line-count 100000"
>>
>> #################################################################
>> # TRANSLATION MODEL TRAINING
>> [TRAINING]
>> script = $moses-script-dir/training/train-model.perl
>> training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G
>> -sort-compress gzip -sort-parallel 12 -cores 12"
>> parallel = yes
>> alignment-symmetrization-method = grow-diag-final-and
>> lexicalized-reordering = msd-bidirectional-fe
>> score-settings = "--GoodTuring"
>> include-word-alignment-in-rules = yes
>>
>> #space separated all out-of domain corpora to be filtered
>> mml-filter-corpora = out
>> mml-before-wa = "-proportion 0.9"
>>
>> #####################################################
>>
>> Thanks.
>>
>>
>> Jian Zhang
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> --
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)<http://www.cngl.ie/index.html>
> Dublin City University <http://www.dcu.ie/>
>
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to