Hi Hassan

The MML filtering is seeing all your data as in-domain, and I think it 
is because you have not correctly specified your out-of-domain data.

In your configuration, the variable mml-filter-corpora should be a  
(space-separated) list of the short names of all of your out-of-domain 
data, i.e. all the corpora that you want to be filtered.

So if you have [CORPUS:europarl] in the [CORPUS] section, and you want 
europarl to be filtered then you set the variable like this:

mml-filter-corpora = europarl

cheers - Barry



On 31/08/13 13:36, Hassan Sajjad wrote:
> Hi,
>
> I am trying to use MML but it's crashing at the 
> TRAINING_mml-filter-before-wa step. I could not resolve the problem. 
> The error and conf entries are copied here.
>
> The corpus-mml-score.3 contains lines equal to my in-domain data and 
> have score 99999 on all lines. Is this correct?
>
> Thank you,
>
> Regards,
> Hassan
>
> ------------------------------------------------------------------------------
> /work/moses-2013-07-10/scripts/ems/support/mml-filter.py 
> /training/corpus-mml.3.ini
> 2013-08-31 12:29:57,126 Loading configuration from 
> /training/corpus-mml.3.ini
> 2013-08-31 12:29:57,128 Configuration:
> general:strategy = Score
> general:source_language = ar
> general:target_language = en
> general:input_stem = /training/corpus.1
> general:output_stem = /training/corpus-mml.3
> general:domain_file = /model/domains.3
> general:domain_file_out = /training/corpus-mml.3
> score:score_file = /training/corpus-mml-score.3
> score:proportion = 0.9
>
> 2013-08-31 12:29:57,170 Retaining at least 0 entries and ignoring 149244
> Traceback (most recent call last):
>   File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", 
> line 156, in <module>
>     main()
>   File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", 
> line 111, in main
>     strategy = strategy_class(config)
>   File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", 
> line 72, in __init__
>     [float(line[:-1]) for line in open(self.score_file)], 
> reverse=True)[ignore_count + count]
> IndexError: list index out of range
> ~
> ------------------------------------------------------------------------
>
> Here are the entries in the conf file:
>
> [MML]
>
> ### specifications for language models to be trained
> #
>
> lm-training = $srilm-dir/ngram-count
> lm-settings = "-interpolate -kndiscount -unk"
> lm-binarizer = $moses-src-dir/bin/build_binary
> lm-query = $moses-src-dir/bin/query
> order = 5
> type = 8
>
> raw-indomain-source = $training/train.$pair-extension.$input-extension
> raw-indomain-target = $training/train.$pair-extension.$output-extension
>
> outdomain-stem = /adapt/un.$pair-extension.utf8.ng.clean
> settings = "--line-count 100000"
>
>
> In TRAINING
>
> ### filtering some corpora with modified Moore-Lewis
> # specify corpora to be filtered and ratio to be kept, either before 
> or after word alignment
> mml-filter-corpora =  /adapt/un.$pair-extension.utf8.ng.clean
> mml-before-wa = "-proportion 0.9"
> #mml-after-wa = "-proportion 0.9"
>
> ### domain adaptation settings
> # options: sparse, any of: indicator, subset, ratio
> domain-features = "subset"
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to