delete and rerun again. Put also delete TRAINING_create-config*
On 26 November 2013 15:31, Daniel Valenzuela <dan...@valenzuela.de> wrote: > Yes I already added in further workarounds type=8. > > To be sure I continued clean by > rm -r tuning/ > rm steps/1/TUNING* > > .../experiment.perl -continue 1 -exec > > same output as before. > > Then I continued even cleaner by > rm -r tuning/ > rm steps/1/TUNING* > rm -r evaluation/newstest2010.filtered.1/ > (there is nothing more *filtered.* in here) > .../experiment.perl -continue 1 -exec > and the output is the same except for evaluation/newstest2010.filtered.1/ > is missing. > > But still I get a crash at the same TUNING:tune step. > > My [LM] section looks like > [LM] > > lmplz = $moses-bin-dir/lmplz > order = 3 > settings = "-T $working-dir/tmp -S 10G" > lm-training = "$moses-script-dir/generic/trainlm-lmplz.perl -lmplz $lmplz" > lm-binarizer = $moses-bin-dir/build_binary > type = 8 > > Crash is still: > line=IRSTLM name=LM0 factor=0 > path=/home/moses/project_test_mgiza/experiment/lm/project-syndicate.binlm.1 > order=3 > Exception: Error: 4 number of threads specified but IRST LM is not > threadsafe. > Exit code: 1 > Failed to run moses with the config > /home/moses/project_test_mgiza/experiment/tuning/moses.filtered.ini.1 at > /home/moses/mosesdecoder/scripts/training/mert-moses.pl line 1271. > cp: cannot stat > ‘/home/moses/project_test_mgiza/experiment/tuning/tmp.1/moses.ini’: No such > file or directory > > Thank you > > > Message: 1 > > Date: Tue, 26 Nov 2013 13:03:03 +0000 > > From: Hieu Hoang <hieuho...@gmail.com> > > Subject: Re: [Moses-support] EMS set up with mgiza and KenLM > > To: moses-support@mit.edu > > Message-ID: <52949c07.3050...@gmail.com> > > Content-Type: text/plain; charset="iso-8859-1" > > > > in the [LM] section, you have to put > > type = 8 > > otherwise the moses.ini will be created to use IRSTLM > > > > You have to delete the filtering directory > > tuning/filtered.? > > evaluation/*.filtered.? > > and delete the tuning sh file > > steps/?/TUNING_tune.* > > > > then continue the experiment > > .../experiment.perl -exec -continue=? > > > > On 26/11/2013 12:08, Daniel Valenzuela wrote: > > > Dear all, > > > after various manual set ups, I wanted to try the EMS. After trying > > > several experiment settings I wanted to run it with multi-giza and > > > kenlm, but I cannot get it to work (tried it again with smaller > > > corpus, same result. I tried to continue the experiment with different > > > fixes - no success. > > > The log tells me: > > > step TUNING:tune crashed > > > further inspection in TUNE_tune.1.STDERR in steps/1/ told me IRSTLM is > > > messing with my project, "against" my will (at least I thought so): > > > line=IRSTLM name=LM0 factor=0 > > > > path=/home/moses/project_test_mgiza/experiment/lm/project-syndicate.binlm.1 > > > order=3 > > > Exception: Error: 4 number of threads specified but IRST LM is not > > > threadsafe. > > > Exit code: 1 > > > Failed to run moses with the config > > > /home/moses/project_test_mgiza/experiment/tuning/moses.filtered.ini.1 > > > at /home/moses/mosesdecoder/scripts/training/mert-moses.pl line 1271. > > > cp: cannot stat > > > '/home/moses/project_test_mgiza/experiment/tuning/tmp.1/moses.ini': No > > > such file or directory > > > Looking up what happened in the tuning folder, I found out that > > > moses.filtered.ini.1 has set IRSTLM for Distortion, but > > > filtered.1/moses.ini has set KenLM for Distortion which satisfies what > > > I hoped to get. > > > I attached the files from above and the following is the config file > > > of the experiment: > > > ################################################ > > > ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### > > > ################################################ > > > > > > > > > [GENERAL] > > > > > > home-dir = /home/moses > > > > > > working-dir = $home-dir/project_test_mgiza/experiment > > > moses-src-dir = $home-dir/mosesdecoder > > > moses-script-dir = $moses-src-dir/scripts > > > moses-bin-dir = $moses-src-dir/bin > > > external-bin-dir = $moses-src-dir/BINDIR > > > data-dir = $home-dir/project_test_mgiza/experiment/corpus > > > train-dir = $data-dir/training > > > dev-dir = $data-dir/dev > > > #irstlm-dir = $home-dir/irstlm/bin > > > > > > > > > ttable-binarizer = $moses-bin-dir/processPhraseTable > > > decoder = $moses-bin-dir/moses > > > > > > input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -l > > > $input-extension -threads 4" > > > output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -l > > > $output-extension" > > > input-truecaser = $moses-script-dir/recaser/truecase.perl > > > output-truecaser = $moses-script-dir/recaser/truecase.perl > > > detruecaser = $moses-script-dir/recaser/detruecase.perl > > > > > > > > > input-extension = de > > > output-extension = en > > > pair-extension = de-en > > > > > > ################################################################# > > > # PARALLEL CORPUS PREPARATION: > > > # create a tokenized, sentence-aligned corpus, ready for training > > > > > > [CORPUS] > > > > > > max-sentence-length = 80 > > > > > > [CORPUS:project-syndicate] > > > raw-stem = $train-dir/news-commentary-v8.$pair-extension > > > > > > [LM] > > > > > > ### tool to be used for language model training > > > # for instance: ngram-count (SRILM), train-lm-on-disk.perl (Edinburgh) > > > # > > > #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores 4 > > > -irst-dir $irstlm-dir -temp-dir $working-dir/tmp" > > > #settings = "-s msb -p 0" > > > #order = 3 > > > #type = 8 > > > #lm-binarizer = $moses-bin-dir/build_binary > > > > > > # path to lmplz binary > > > lmplz = $moses-bin-dir/lmplz > > > # order of the language model > > > order = 3 > > > # additional parameters to lmplz (check lmplz help message) > > > settings = "-T $working-dir/tmp -S 10G" > > > # this tells EMS to use lmplz and tells EMS where lmplz is located > > > lm-training = "$moses-script-dir/generic/trainlm-lmplz.perl -lmplz > > > $lmplz" > > > lm-binarizer = $moses-bin-dir/build_binary > > > > > > > > > > > > [LM:project-syndicate] > > > raw-corpus = > > > $train-dir/news-commentary-v8.$pair-extension.$output-extension > > > > > > > > > ################################################################# > > > # TRANSLATION MODEL TRAINING > > > > > > [TRAINING] > > > > > > > > > ### training script to be used: either a legacy script or > > > # current moses training script (default) > > > # > > > #script = $moses-script-dir/training/train-model.perl > > > > > > > > > ### general options > > > # > > > script = $moses-script-dir/training/train-model.perl > > > training-options = "-mgiza -mgiza-cpus 4 -cores 4 \ > > > -parallel -sort-buffer-size 10G -sort-batch-size 253 \ > > > -sort-compress gzip -sort-parallel 10" > > > parallel = yes > > > > > > ### symmetrization method to obtain word alignments from giza output > > > # (commonly used: grow-diag-final-and) > > > # > > > #alignment-symmetrization-method = berkeley > > > alignment-symmetrization-method = grow-diag-final-and > > > > > > ### lexicalized reordering: specify orientation type > > > # (default: only distance-based reordering model) > > > # > > > lexicalized-reordering = msd-bidirectional-fe > > > > > > ### if word alignment (giza symmetrization) should be skipped, > > > # point to word alignment files > > > # > > > #word-alignment = > > > > > > ### if phrase extraction should be skipped, > > > # point to stem for extract files > > > # > > > #extracted-phrases = > > > > > > ### if phrase table training should be skipped, > > > # point to phrase translation table > > > # > > > #phrase-translation-table = > > > > > > ### if reordering table training should be skipped, > > > # point to reordering table > > > # > > > #reordering-table = > > > > > > ### if training should be skipped, > > > # point to a configuration file that contains > > > # pointers to all relevant model files > > > # > > > #config = > > > > > > ### TUNING: finding good weights for model components > > > > > > [TUNING] > > > > > > ### instead of tuning with this setting, old weights may be recycled > > > > > > ### tuning script to be used > > > # > > > tuning-script = $moses-script-dir/training/mert-moses.pl > > > tuning-settings = "-mertdir $moses-bin-dir -threads 4" > > > > > > ### specify the corpus used for tuning > > > # it should contain 100s if not 1000s of sentences > > > # > > > raw-input = $dev-dir/news-test2008.$input-extension > > > > > > raw-reference = $dev-dir/news-test2008.$output-extension > > > > > > ### size of n-best list used (typically 100) > > > # > > > nbest = 100 > > > > > > ### ranges for weights for random initialization > > > # if not specified, the tuning script will use generic ranges > > > # it is not clear, if this matters > > > # > > > # lambda = > > > > > > ### additional flags for the decoder > > > # > > > decoder-settings = "-threads 4" > > > > > > ### if tuning should be skipped, specify this here > > > # and also point to a configuration file that contains > > > # pointers to all relevant model files > > > # > > > #config = > > > > > > > > > ####################################################### > > > ## TRUECASER: train model to truecase corpora and input > > > > > > [TRUECASER] > > > > > > ### script to train truecaser models > > > # > > > trainer = $moses-script-dir/recaser/train-truecaser.perl > > > > > > ### training data > > > # raw input needs to be still tokenized, > > > # also also tokenized input may be specified > > > # > > > raw-stem = CORPUS:raw-stem > > > > > > ### trained model > > > # > > > #truecase-model = > > > > > > > > > ################################## > > > ## EVALUATION: score system output > > > > > > [EVALUATION] > > > > > > ### prepare system output for scoring > > > # this may include detokenization and wrapping output in sgm > > > # (needed for nist-bleu, ter, meteor) > > > # > > > detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l > > > $output-extension" > > > > > > decoder-settings = "-threads 4" > > > > > > ### should output be scored case-sensitive (default: no)? > > > # > > > # case-sensitive = yes > > > > > > ### BLEU > > > # > > > > > > multi-bleu = "$moses-script-dir/generic/multi-bleu.perl -lc" > > > # ibm-bleu = > > > > > > ### TER: translation error rate (BBN metric) based on edit distance > > > # > > > # ter = $edinburgh-script-dir/tercom_v6a.pl > > > > > > ### METEOR: gives credit to stem / worknet synonym matches > > > # > > > # meteor = > > > > > > [EVALUATION:newstest2010] > > > raw-input = $dev-dir/newstest2011.$input-extension > > > raw-reference = $dev-dir/newstest2011.$output-extension > > > > > > > > > [REPORTING] > > > > > > ### what to do with result (default: store in file evaluation/report) > > > # > > > # email = pko...@inf.ed.ac.uk > > > ____________________ > > > I hope anybody can help or suggest me what to do. > > > Thank you and kind regards > > > Daniel > > > > > > > > > _______________________________________________ > > > Moses-support mailing list > > > Moses-support@mit.edu > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > *** > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support