Hi Hassan The MML filtering is seeing all your data as in-domain, and I think it is because you have not correctly specified your out-of-domain data.
In your configuration, the variable mml-filter-corpora should be a (space-separated) list of the short names of all of your out-of-domain data, i.e. all the corpora that you want to be filtered. So if you have [CORPUS:europarl] in the [CORPUS] section, and you want europarl to be filtered then you set the variable like this: mml-filter-corpora = europarl cheers - Barry On 31/08/13 13:36, Hassan Sajjad wrote: > Hi, > > I am trying to use MML but it's crashing at the > TRAINING_mml-filter-before-wa step. I could not resolve the problem. > The error and conf entries are copied here. > > The corpus-mml-score.3 contains lines equal to my in-domain data and > have score 99999 on all lines. Is this correct? > > Thank you, > > Regards, > Hassan > > ------------------------------------------------------------------------------ > /work/moses-2013-07-10/scripts/ems/support/mml-filter.py > /training/corpus-mml.3.ini > 2013-08-31 12:29:57,126 Loading configuration from > /training/corpus-mml.3.ini > 2013-08-31 12:29:57,128 Configuration: > general:strategy = Score > general:source_language = ar > general:target_language = en > general:input_stem = /training/corpus.1 > general:output_stem = /training/corpus-mml.3 > general:domain_file = /model/domains.3 > general:domain_file_out = /training/corpus-mml.3 > score:score_file = /training/corpus-mml-score.3 > score:proportion = 0.9 > > 2013-08-31 12:29:57,170 Retaining at least 0 entries and ignoring 149244 > Traceback (most recent call last): > File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", > line 156, in <module> > main() > File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", > line 111, in main > strategy = strategy_class(config) > File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py", > line 72, in __init__ > [float(line[:-1]) for line in open(self.score_file)], > reverse=True)[ignore_count + count] > IndexError: list index out of range > ~ > ------------------------------------------------------------------------ > > Here are the entries in the conf file: > > [MML] > > ### specifications for language models to be trained > # > > lm-training = $srilm-dir/ngram-count > lm-settings = "-interpolate -kndiscount -unk" > lm-binarizer = $moses-src-dir/bin/build_binary > lm-query = $moses-src-dir/bin/query > order = 5 > type = 8 > > raw-indomain-source = $training/train.$pair-extension.$input-extension > raw-indomain-target = $training/train.$pair-extension.$output-extension > > outdomain-stem = /adapt/un.$pair-extension.utf8.ng.clean > settings = "--line-count 100000" > > > In TRAINING > > ### filtering some corpora with modified Moore-Lewis > # specify corpora to be filtered and ratio to be kept, either before > or after word alignment > mml-filter-corpora = /adapt/un.$pair-extension.utf8.ng.clean > mml-before-wa = "-proportion 0.9" > #mml-after-wa = "-proportion 0.9" > > ### domain adaptation settings > # options: sparse, any of: indicator, subset, ratio > domain-features = "subset" > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
