Στις 09/06/2012 05:48 μμ, ο/η Philipp Koehn έγραψε: > Hi, > > can you check what the files were produced in the tuning tmp > directory, their sizes and the intermediate BLEU scores are? > Maybe the decoder crashed, the reference files are missing > or mismatched, etc. > > -phi > > On Fri, Jun 8, 2012 at 1:55 PM, Δημήτρης Μπαμπανιώτης > <[email protected]> wrote: >> Στις 30/05/2012 12:41 πμ, ο/η Δημήτρης Μπαμπανιώτης έγραψε: >> >>> Στις 28/05/2012 10:01 μμ, ο/η Philipp Koehn έγραψε: >>>> Hi, >>>> >>>> there is a problem here: >>>> >>>> # conversion of phrase table into binary on-disk format >>>> #ttable-binarizer = $moses-bin-dir/processPhraseTable >>>> >>>> # conversion of rule table into binary on-disk format >>>> ttable-binarizer = "$moses-bin-dir/CreateOnDisk 1 1 5 100 2" >>>> >>>> You are using the ttable binarizer for the hierarchical/syntax model, >>>> but you use a phrase-based model. >>>> >>>> -phi >>>> >>>> On Sun, May 27, 2012 at 11:45 PM, Dimitris Babaniotis >>>> <[email protected]> wrote: >>>>> Hello, I'm trying to run experiments with EMS but the process stops on >>>>> tuning:tune. >>>>> >>>>> Here is the TUNING_tune.stderr file : >>>>> >>>>> main::create_extractor_script() called too early to check prototype at >>>>> /home/dimbaba/moses/scripts/training/mert-moses.pl line 674. >>>>> Using SCRIPTS_ROOTDIR: /home/dimbaba/moses/scripts >>>>> Asking moses for feature names and values from >>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4 >>>>> Executing: /home/dimbaba/moses/dist/bin/moses -v 0 -config >>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4 >>>>> -inputtype 0 -show-weights> ./features.list >>>>> MERT starting values and ranges for random generation: >>>>> d = 0.600 ( 0.00 .. 1.00) >>>>> lm = 0.250 ( 0.00 .. 1.00) >>>>> lm = 0.250 ( 0.00 .. 1.00) >>>>> w = -1.000 ( 0.00 .. 1.00) >>>>> tm = 0.200 ( 0.00 .. 1.00) >>>>> tm = 0.200 ( 0.00 .. 1.00) >>>>> tm = 0.200 ( 0.00 .. 1.00) >>>>> tm = 0.200 ( 0.00 .. 1.00) >>>>> tm = 0.200 ( 0.00 .. 1.00) >>>>> Saved: ./run1.moses.ini >>>>> Normalizing lambdas: 0.600000 0.250000 0.250000 -1.000000 0.200000 >>>>> 0.200000 >>>>> 0.200000 0.200000 0.200000 >>>>> DECODER_CFG = -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm >>>>> 0.064516 >>>>> 0.064516 0.064516 0.064516 0.064516 >>>>> Executing: /home/dimbaba/moses/dist/bin/moses -v 0 -config >>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4 >>>>> -inputtype 0 -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm 0.064516 >>>>> 0.064516 0.064516 0.064516 0.064516 -n-best-list run1.best100.out 100 >>>>> -input-file /home/dimbaba/mosesFactored/experiment/tuning/input.tc.1> >>>>> run1.out >>>>> Translating line 0 in thread id 140471666632448 >>>>> Check (*contextFactor[count-1])[factorType] != NULL failed in >>>>> moses/src/LM/SRI.cpp:155 >>>>> sh: line 1: 1648 Ακυρώθηκε (core dumped) >>>>> /home/dimbaba/moses/dist/bin/moses >>>>> -v 0 -config >>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4 >>>>> -inputtype 0 -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm 0.064516 >>>>> 0.064516 0.064516 0.064516 0.064516 -n-best-list run1.best100.out 100 >>>>> -input-file /home/dimbaba/mosesFactored/experiment/tuning/input.tc.1> >>>>> run1.out >>>>> Exit code: 134 >>>>> The decoder died. CONFIG WAS -w -0.322581 -lm 0.080645 0.080645 -d >>>>> 0.193548 >>>>> -tm 0.064516 0.064516 0.064516 0.064516 0.064516 >>>>> cp: cannot stat >>>>> «/home/dimbaba/mosesFactored/experiment/tuning/tmp.4/moses.ini»: Δεν >>>>> υπάρχει >>>>> τέτοιο αρχείο ή κατάλογος >>>>> >>>>> >>>>> ...and this is my configuration file: >>>>> >>>>> >>>>> ################################################ >>>>> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### >>>>> ################################################ >>>>> >>>>> [GENERAL] >>>>> >>>>> ### directory in which experiment is run >>>>> # >>>>> working-dir = /home/dimbaba/mosesFactored/experiment >>>>> >>>>> # specification of the language pair >>>>> input-extension = de >>>>> output-extension = el >>>>> pair-extension = de-el >>>>> >>>>> ### directories that contain tools and data >>>>> # >>>>> # moses >>>>> moses-src-dir = /home/dimbaba/moses >>>>> # >>>>> # moses binaries >>>>> moses-bin-dir = $moses-src-dir/dist/bin >>>>> # >>>>> # moses scripts >>>>> moses-script-dir = $moses-src-dir/scripts >>>>> # >>>>> # srilm >>>>> srilm-dir = /home/dimbaba/srilm/bin/i686-m64 >>>>> # >>>>> # irstlm >>>>> #irstlm-dir = $moses-src-dir/irstlm/bin >>>>> # >>>>> # randlm >>>>> #randlm-dir = $moses-src-dir/randlm/bin >>>>> # >>>>> # data >>>>> wmt12-data = /home/dimbaba/aligned/el-de >>>>> >>>>> ### basic tools >>>>> # >>>>> # moses decoder >>>>> decoder = $moses-bin-dir/moses >>>>> >>>>> # conversion of phrase table into binary on-disk format >>>>> #ttable-binarizer = $moses-bin-dir/processPhraseTable >>>>> >>>>> # conversion of rule table into binary on-disk format >>>>> ttable-binarizer = "$moses-bin-dir/CreateOnDisk 1 1 5 100 2" >>>>> >>>>> # tokenizers - comment out if all your data is already tokenized >>>>> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l >>>>> $input-extension" >>>>> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l >>>>> $output-extension" >>>>> >>>>> # truecasers - comment out if you do not use the truecaser >>>>> input-truecaser = $moses-script-dir/recaser/truecase.perl >>>>> output-truecaser = $moses-script-dir/recaser/truecase.perl >>>>> detruecaser = $moses-script-dir/recaser/detruecase.perl >>>>> >>>>> ### generic parallelizer for cluster and multi-core machines >>>>> # you may specify a script that allows the parallel execution >>>>> # parallizable steps (see meta file). you also need specify >>>>> # the number of jobs (cluster) or cores (multicore) >>>>> # >>>>> #generic-parallelizer = >>>>> $moses-script-dir/ems/support/generic-parallelizer.perl >>>>> #generic-parallelizer = >>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl >>>>> >>>>> ### cluster settings (if run on a cluster machine) >>>>> # number of jobs to be submitted in parallel >>>>> # >>>>> #jobs = 10 >>>>> >>>>> # arguments to qsub when scheduling a job >>>>> #qsub-settings = "" >>>>> >>>>> # project for priviledges and usage accounting >>>>> #qsub-project = iccs_smt >>>>> >>>>> # memory and time >>>>> #qsub-memory = 4 >>>>> #qsub-hours = 48 >>>>> >>>>> ### multi-core settings >>>>> # when the generic parallelizer is used, the number of cores >>>>> # specified here >>>>> cores = 4 >>>>> >>>>> ################################################################# >>>>> # PARALLEL CORPUS PREPARATION: >>>>> # create a tokenized, sentence-aligned corpus, ready for training >>>>> >>>>> [CORPUS] >>>>> >>>>> ### long sentences are filtered out, since they slow down GIZA++ >>>>> # and are a less reliable source of data. set here the maximum >>>>> # length of a sentence >>>>> # >>>>> max-sentence-length = 100 >>>>> >>>>> [CORPUS:europarl] IGNORE >>>>> >>>>> ### command to run to get raw corpus files >>>>> # >>>>> # get-corpus-script = >>>>> >>>>> ### raw corpus files (untokenized, but sentence aligned) >>>>> # >>>>> raw-stem = $wmt12-data/training/training.clean10 >>>>> >>>>> ### tokenized corpus files (may contain long sentences) >>>>> # >>>>> #tokenized-stem = >>>>> >>>>> ### if sentence filtering should be skipped, >>>>> # point to the clean training data >>>>> # >>>>> #clean-stem = >>>>> >>>>> ### if corpus preparation should be skipped, >>>>> # point to the prepared training data >>>>> # >>>>> #lowercased-stem = >>>>> >>>>> [CORPUS:nc] >>>>> raw-stem = $wmt12-data/training/training.clean10 >>>>> >>>>> [CORPUS:un] IGNORE >>>>> raw-stem = $wmt12-data/training/training.clean10 >>>>> >>>>> ################################################################# >>>>> # LANGUAGE MODEL TRAINING >>>>> >>>>> [LM] >>>>> >>>>> ### tool to be used for language model training >>>>> # srilm >>>>> lm-training = $srilm-dir/ngram-count >>>>> settings = "" >>>>> >>>>> # irstlm >>>>> #lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores >>>>> $cores >>>>> -irst-dir $irstlm-dir -temp-dir $working-dir/lm" >>>>> #settings = "" >>>>> >>>>> # order of the language model >>>>> order = 3 >>>>> >>>>> ### tool to be used for training randomized language model from scratch >>>>> # (more commonly, a SRILM is trained) >>>>> # >>>>> #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8" >>>>> >>>>> ### script to use for binary table format for irstlm or kenlm >>>>> # (default: no binarization) >>>>> >>>>> # irstlm >>>>> #lm-binarizer = $irstlm-dir/compile-lm >>>>> >>>>> # kenlm, also set type to 8 >>>>> #lm-binarizer = $moses-bin-dir/build_binary >>>>> #type = 8 >>>>> >>>>> ### script to create quantized language model format (irstlm) >>>>> # (default: no quantization) >>>>> # >>>>> #lm-quantizer = $irstlm-dir/quantize-lm >>>>> >>>>> ### script to use for converting into randomized table format >>>>> # (default: no randomization) >>>>> # >>>>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8" >>>>> >>>>> ### each language model to be used has its own section here >>>>> >>>>> [LM:europarl] IGNORE >>>>> >>>>> ### command to run to get raw corpus files >>>>> # >>>>> #get-corpus-script = "" >>>>> >>>>> ### raw corpus (untokenized) >>>>> # >>>>> raw-corpus = $wmt12-data/training/training.clean.$output-extension >>>>> >>>>> ### tokenized corpus files (may contain long sentences) >>>>> # >>>>> #tokenized-corpus = >>>>> >>>>> ### if corpus preparation should be skipped, >>>>> # point to the prepared language model >>>>> # >>>>> #lm = >>>>> >>>>> [LM:nc] >>>>> raw-corpus = $wmt12-data/training/training.clean10.$output-extension >>>>> >>>>> [LM:un] IGNORE >>>>> raw-corpus = >>>>> $wmt12-data/training/undoc.2000.$pair-extension.$output-extension >>>>> >>>>> [LM:news] IGNORE >>>>> raw-corpus = $wmt12-data/training/news.$output-extension.shuffled >>>>> >>>>> [LM:nc=stem] >>>>> factors = "stem" >>>>> order = 3 >>>>> settings = "" >>>>> raw-corpus = $wmt12-data/training/training.clean.$output-extension >>>>> >>>>> ################################################################# >>>>> # INTERPOLATING LANGUAGE MODELS >>>>> >>>>> [INTERPOLATED-LM] IGNORE >>>>> >>>>> # if multiple language models are used, these may be combined >>>>> # by optimizing perplexity on a tuning set >>>>> # see, for instance [Koehn and Schwenk, IJCNLP 2008] >>>>> >>>>> ### script to interpolate language models >>>>> # if commented out, no interpolation is performed >>>>> # >>>>> script = $moses-script-dir/ems/support/interpolate-lm.perl >>>>> >>>>> ### tuning set >>>>> # you may use the same set that is used for mert tuning (reference set) >>>>> # >>>>> tuning-sgm = $wmt12-data/dev/newstest2010-ref.$output-extension.sgm >>>>> #raw-tuning = >>>>> #tokenized-tuning = >>>>> #factored-tuning = >>>>> #lowercased-tuning = >>>>> #split-tuning = >>>>> >>>>> ### group language models for hierarchical interpolation >>>>> # (flat interpolation is limited to 10 language models) >>>>> #group = "first,second fourth,fifth" >>>>> >>>>> ### script to use for binary table format for irstlm or kenlm >>>>> # (default: no binarization) >>>>> >>>>> # irstlm >>>>> #lm-binarizer = $irstlm-dir/compile-lm >>>>> >>>>> # kenlm, also set type to 8 >>>>> #lm-binarizer = $moses-bin-dir/build_binary >>>>> #type = 8 >>>>> >>>>> ### script to create quantized language model format (irstlm) >>>>> # (default: no quantization) >>>>> # >>>>> #lm-quantizer = $irstlm-dir/quantize-lm >>>>> >>>>> ### script to use for converting into randomized table format >>>>> # (default: no randomization) >>>>> # >>>>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8" >>>>> >>>>> ################################################################# >>>>> # FACTOR DEFINITION >>>>> >>>>> [INPUT-FACTOR] >>>>> >>>>> # also used for output factors >>>>> temp-dir = $working-dir/training/factor >>>>> [INPUT-FACTOR:stem] >>>>> >>>>> factor-script = >>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl >>>>> 3" >>>>> ### script that generates this factor >>>>> # >>>>> #mxpost = /home/pkoehn/bin/mxpost >>>>> factor-script = >>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl >>>>> 3" >>>>> [OUTPUT-FACTOR:stem] >>>>> >>>>> factor-script = >>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl >>>>> 3" >>>>> ### script that generates this factor >>>>> # >>>>> #mxpost = /home/pkoehn/bin/mxpost >>>>> factor-script = >>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl >>>>> 3" >>>>> >>>>> ################################################################# >>>>> # TRANSLATION MODEL TRAINING >>>>> >>>>> [TRAINING] >>>>> >>>>> ### training script to be used: either a legacy script or >>>>> # current moses training script (default) >>>>> # >>>>> script = $moses-script-dir/training/train-model.perl >>>>> >>>>> ### general options >>>>> # these are options that are passed on to train-model.perl, for instance >>>>> # * "-mgiza -mgiza-cpus 8" to use mgiza instead of giza >>>>> # * "-sort-buffer-size 8G" to reduce on-disk sorting >>>>> # >>>>> #training-options = "" >>>>> >>>>> ### factored training: specify here which factors used >>>>> # if none specified, single factor training is assumed >>>>> # (one translation step, surface to surface) >>>>> # >>>>> input-factors = word stem >>>>> output-factors = word stem >>>>> alignment-factors = "stem -> stem" >>>>> translation-factors = "word -> word" >>>>> reordering-factors = "word -> word" >>>>> #generation-factors = >>>>> decoding-steps = "t0" >>>>> >>>>> ### parallelization of data preparation step >>>>> # the two directions of the data preparation can be run in parallel >>>>> # comment out if not needed >>>>> # >>>>> parallel = yes >>>>> >>>>> ### pre-computation for giza++ >>>>> # giza++ has a more efficient data structure that needs to be >>>>> # initialized with snt2cooc. if run in parallel, this may reduces >>>>> # memory requirements. set here the number of parts >>>>> # >>>>> #run-giza-in-parts = 5 >>>>> >>>>> ### symmetrization method to obtain word alignments from giza output >>>>> # (commonly used: grow-diag-final-and) >>>>> # >>>>> alignment-symmetrization-method = grow-diag-final-and >>>>> >>>>> ### use of berkeley aligner for word alignment >>>>> # >>>>> #use-berkeley = true >>>>> #alignment-symmetrization-method = berkeley >>>>> #berkeley-train = $moses-script-dir/ems/support/berkeley-train.sh >>>>> #berkeley-process = $moses-script-dir/ems/support/berkeley-process.sh >>>>> #berkeley-jar = /your/path/to/berkeleyaligner-1.1/berkeleyaligner.jar >>>>> #berkeley-java-options = "-server -mx30000m -ea" >>>>> #berkeley-training-options = "-Main.iters 5 5 -EMWordAligner.numThreads >>>>> 8" >>>>> #berkeley-process-options = "-EMWordAligner.numThreads 8" >>>>> #berkeley-posterior = 0.5 >>>>> >>>>> ### if word alignment should be skipped, >>>>> # point to word alignment files >>>>> # >>>>> #word-alignment = $working-dir/model/aligned.1 >>>>> >>>>> ### create a bilingual concordancer for the model >>>>> # >>>>> #biconcor = $moses-script-dir/ems/biconcor/biconcor >>>>> >>>>> ### lexicalized reordering: specify orientation type >>>>> # (default: only distance-based reordering model) >>>>> # >>>>> lexicalized-reordering = msd-bidirectional-fe >>>>> >>>>> ### hierarchical rule set >>>>> # >>>>> hierarchical-rule-set = true >>>>> >>>>> ### settings for rule extraction >>>>> # >>>>> #extract-settings = "" >>>>> >>>>> ### unknown word labels (target syntax only) >>>>> # enables use of unknown word labels during decoding >>>>> # label file is generated during rule extraction >>>>> # >>>>> #use-unknown-word-labels = true >>>>> >>>>> ### if phrase extraction should be skipped, >>>>> # point to stem for extract files >>>>> # >>>>> # extracted-phrases = >>>>> >>>>> ### settings for rule scoring >>>>> # >>>>> score-settings = "--GoodTuring" >>>>> >>>>> ### include word alignment in phrase table >>>>> # >>>>> #include-word-alignment-in-rules = yes >>>>> >>>>> ### if phrase table training should be skipped, >>>>> # point to phrase translation table >>>>> # >>>>> # phrase-translation-table = >>>>> >>>>> ### if reordering table training should be skipped, >>>>> # point to reordering table >>>>> # >>>>> # reordering-table = >>>>> >>>>> ### if training should be skipped, >>>>> # point to a configuration file that contains >>>>> # pointers to all relevant model files >>>>> # >>>>> #config-with-reused-weights = >>>>> >>>>> ##################################################### >>>>> ### TUNING: finding good weights for model components >>>>> >>>>> [TUNING] >>>>> >>>>> ### instead of tuning with this setting, old weights may be recycled >>>>> # specify here an old configuration file with matching weights >>>>> # >>>>> #weight-config = $working-dir/tuning/moses.filtered.ini.1 >>>>> >>>>> ### tuning script to be used >>>>> # >>>>> tuning-script = $moses-script-dir/training/mert-moses.pl >>>>> tuning-settings = "-mertdir $moses-bin-dir --filtercmd >>>>> '$moses-script-dir/training/filter-model-given-input.pl'" >>>>> >>>>> ### specify the corpus used for tuning >>>>> # it should contain 1000s of sentences >>>>> # >>>>> #input-sgm = >>>>> raw-input = $wmt12-data/tuning/tuning.clean.$input-extension >>>>> #tokenized-input = >>>>> #factorized-input = >>>>> #input = >>>>> # >>>>> #reference-sgm = >>>>> raw-reference = $wmt12-data/tuning/tuning.clean.$output-extension >>>>> #tokenized-reference = >>>>> #factorized-reference = >>>>> #reference = >>>>> >>>>> ### size of n-best list used (typically 100) >>>>> # >>>>> nbest = 100 >>>>> >>>>> ### ranges for weights for random initialization >>>>> # if not specified, the tuning script will use generic ranges >>>>> # it is not clear, if this matters >>>>> # >>>>> # lambda = >>>>> >>>>> ### additional flags for the filter script >>>>> # >>>>> #filter-settings = "-Binarizer CreateOnDiskPt 1 1 5 100 2 -Hierarchical" >>>>> >>>>> ### additional flags for the decoder >>>>> # >>>>> decoder-settings = "" >>>>> >>>>> ### if tuning should be skipped, specify this here >>>>> # and also point to a configuration file that contains >>>>> # pointers to all relevant model files >>>>> # >>>>> #config = >>>>> >>>>> ######################################################### >>>>> ## RECASER: restore case, this part only trains the model >>>>> >>>>> [RECASING] >>>>> >>>>> #decoder = $moses-bin-dir/moses >>>>> >>>>> ### training data >>>>> # raw input needs to be still tokenized, >>>>> # also also tokenized input may be specified >>>>> # >>>>> #tokenized = [LM:europarl:tokenized-corpus] >>>>> >>>>> # recase-config = >>>>> >>>>> #lm-training = $srilm-dir/ngram-count >>>>> >>>>> ####################################################### >>>>> ## TRUECASER: train model to truecase corpora and input >>>>> >>>>> [TRUECASER] >>>>> >>>>> ### script to train truecaser models >>>>> # >>>>> trainer = $moses-script-dir/recaser/train-truecaser.perl >>>>> >>>>> ### training data >>>>> # data on which truecaser is trained >>>>> # if no training data is specified, parallel corpus is used >>>>> # >>>>> # raw-stem = >>>>> # tokenized-stem = >>>>> >>>>> ### trained model >>>>> # >>>>> # truecase-model = >>>>> >>>>> ###################################################################### >>>>> ## EVALUATION: translating a test set using the tuned system and score >>>>> it >>>>> >>>>> [EVALUATION] >>>>> >>>>> ### number of jobs (if parallel execution on cluster) >>>>> # >>>>> #jobs = 10 >>>>> >>>>> ### additional flags for the filter script >>>>> # >>>>> #filter-settings = "" >>>>> >>>>> ### additional decoder settings >>>>> # switches for the Moses decoder >>>>> # common choices: >>>>> # "-threads N" for multi-threading >>>>> # "-mbr" for MBR decoding >>>>> # "-drop-unknown" for dropping unknown source words >>>>> # "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000" for cube >>>>> pruning >>>>> # >>>>> decoder-settings = "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s >>>>> 5000" >>>>> >>>>> ### specify size of n-best list, if produced >>>>> # >>>>> #nbest = 100 >>>>> >>>>> ### multiple reference translations >>>>> # >>>>> #multiref = yes >>>>> >>>>> ### prepare system output for scoring >>>>> # this may include detokenization and wrapping output in sgm >>>>> # (needed for nist-bleu, ter, meteor) >>>>> # >>>>> detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l >>>>> $output-extension" >>>>> #recaser = $moses-script-dir/recaser/recase.perl >>>>> wrapping-script = "$moses-script-dir/ems/support/wrap-xml.perl >>>>> $output-extension" >>>>> #output-sgm = >>>>> >>>>> ### BLEU >>>>> # >>>>> nist-bleu = $moses-script-dir/generic/mteval-v13a.pl >>>>> nist-bleu-c = "$moses-script-dir/generic/mteval-v13a.pl -c" >>>>> #multi-bleu = $moses-script-dir/generic/multi-bleu.perl >>>>> #ibm-bleu = >>>>> >>>>> ### TER: translation error rate (BBN metric) based on edit distance >>>>> # not yet integrated >>>>> # >>>>> # ter = >>>>> >>>>> ### METEOR: gives credit to stem / worknet synonym matches >>>>> # not yet integrated >>>>> # >>>>> # meteor = >>>>> >>>>> ### Analysis: carry out various forms of analysis on the output >>>>> # >>>>> analysis = $moses-script-dir/ems/support/analysis.perl >>>>> # >>>>> # also report on input coverage >>>>> analyze-coverage = yes >>>>> # >>>>> # also report on phrase mappings used >>>>> report-segmentation = yes >>>>> # >>>>> # report precision of translations for each input word, broken down by >>>>> # count of input word in corpus and model >>>>> #report-precision-by-coverage = yes >>>>> # >>>>> # further precision breakdown by factor >>>>> #precision-by-coverage-factor = pos >>>>> >>>>> [EVALUATION:newstest2011] >>>>> >>>>> ### input data >>>>> # >>>>> #input-sgm = "$wmt12-data/$input-extension-test.txt" >>>>> #raw-input = $wmt12-data/$input-extension-test.txt >>>>> tokenized-input = "$wmt12-data/de-test.txt" >>>>> # factorized-input = >>>>> #input = $wmt12-data/$input-extension-test.txt >>>>> >>>>> ### reference data >>>>> # >>>>> #reference-sgm = "$wmt12-data/$output-extension-test.txt" >>>>> #raw-reference ="wmt12-data/$output-extension -test.txt >>>>> tokenized-reference = "$wmt12-data/el-test.txt" >>>>> #reference = $wmt12-data/el-test.txt >>>>> >>>>> ### analysis settings >>>>> # may contain any of the general evaluation analysis settings >>>>> # specific setting: base coverage statistics on earlier run >>>>> # >>>>> #precision-by-coverage-base = $working-dir/evaluation/test.analysis.5 >>>>> >>>>> ### wrapping frame >>>>> # for nist-bleu and other scoring scripts, the output needs to be >>>>> wrapped >>>>> # in sgm markup (typically like the input sgm) >>>>> # >>>>> wrapping-frame = $tokenized-input >>>>> >>>>> ########################################## >>>>> ### REPORTING: summarize evaluation scores >>>>> >>>>> [REPORTING] >>>>> >>>>> ### currently no parameters for reporting section >>>>> >>>>> Thank you, >>>>> >>>>> Dimitris Babaniotis >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>> Hi, thank you for your answer, >>> >>> I fixed the problem that you mentioned but the problem still exists. >>> >>> I searched more and i found that the error occurs when the decoder tries >>> to to translate a sentence. >>> The problem exists with or without EMS. >>> >>> Dimitris >>> >> Hi, >> >> I have a new problem with the moses machine, when the tuning process >> finished all the weights were zero. >> >> Do you know what happened? >> >> Here is my configuration file from tuning: >> >> # MERT optimized configuration >> # decoder /home/dimbaba/mosesdecoder/dist/bin/moses >> # BLEU 0 on dev /home/dimbaba/mosesOnlySuffix/tuning.combined.de >> # We were before running iteration 2 >> # finished Τρι 05 Ιούν 2012 03:14:03 μμ EEST >> >> ### MOSES CONFIG FILE ### >> ######################### >> >> # input factors >> [input-factors] >> 0 >> 1 >> >> # mapping steps >> [mapping] >> 0 T 0 >> 0 T 1 >> >> # translation tables: table type (hierarchical(0), textual (0), binary (1)), >> source-factors, target-factors, number of scores, file >> # OLD FORMAT is still handled for back-compatibility >> # OLD FORMAT translation tables: source-factors, target-factors, number of >> scores, file >> # OLD FORMAT a binary table type (1) is assumed >> [ttable-file] >> 0 0 0 5 >> /home/dimbaba/mosesOnlySuffix/work/tuning/mert/filtered/phrase-table.0-0.1.1.gz >> 0 1 1 5 >> /home/dimbaba/mosesOnlySuffix/work/tuning/mert/filtered/phrase-table.1-1.1.1.gz >> >> >> # no generation models, no generation-file section >> >> # language models: type(srilm/irstlm), factors, order, file >> [lmodel-file] >> 0 1 3 /home/dimbaba/mosesOnlySuffix/factored.lm >> >> >> >> # limit on how many phrase translations e for each phrase f are loaded >> # 0 = all elements loaded >> [ttable-limit] >> 20 >> 0 >> >> # distortion (reordering) weight >> [weight-d] >> 0 >> >> # language model weights >> [weight-l] >> 0 >> >> >> # translation model weights >> [weight-t] >> 0 >> 0 >> 0 >> 0 >> 0 >> 0 >> 0 >> 0 >> 0 >> >> 0 >> >> # no generation models, no weight-generation section >> >> # word penalty >> [weight-w] >> 0 >> >> [distortion-limit] >> 6 >> >> Dimitris Babaniotis >> These are the files in tuning folder
filtered extract.err - 606 b extract.out - 0 features.list - 307 filterphrases.err - 568 filterphrases.out - 162 finished_step.txt - 2 init.opt - 78 mert.log - 611 mert.out - 0 moses.ini - 1491 run1.best100.out.gz - 2,7 mb run1.extract.err - 606 run1.extract.out - 0 run1.features.dat - 10,2 mb run1.init.opt - 170 run1.mert.log - 472 run1.mert.out - 0 run1.moses.ini - 1536 run1.out - 289,143 run1.scores.dat - 2,34 mb run1.weights.txt - 27 run2.best100.out.gz - 2,5 mb run2.extract.err - 606 run2.extract.out - 0 run2.features.dat - 10,5 mb run2.init.opt - 78 run2.mert.log - 611 run2.mert.out - 0 run2.moses.ini - 1428 run2.out - 383 kb run2.scores.dat - 2,4 mb run2.weights.txt - 27 weights.txt - 27 extractor.sh - 253 Here are the last lines from the tuning process: Finished translating Translating line 999 in thread id 139978645575424 Translating: dadurch|rch wird|ird die|die zukunft|nft eines|nes demokratischen|hen und|und sozialen|len europas|pas untergraben|ben .|. Collecting options took 0.040 seconds Search took 93.290 seconds BEST TRANSLATION: με|έσω το|του οηε|κάο που|και αύριο|κού που|νός ένα|και δημοκρατία|ίγη κατ|εια κοινωνικής|κής ευρώπης|πης και|εια υπονομεύει|πει [11111111111] [total=0.000] <<-3.000, -13.000, 0.000, -98.688, -35.115, -43.861, -52.932, -54.004, 9.999, -51.843, -54.961, -78.138, -76.006, 9.999>> Translation took 93.650 seconds Finished translating The decoder returns the scores in this order: d lm w tm tm tm tm tm tm tm tm tm tm Executing: gzip -f run2.best100.out Scoring the nbestlist. exec: /home/dimbaba/mosesOnlySuffix/work/tuning/mert/extractor.sh Executing: /home/dimbaba/mosesOnlySuffix/work/tuning/mert/extractor.sh > extract.out 2> extract.err Executing: \cp -f init.opt run2.init.opt exec: /home/dimbaba/mosesdecoder/mert/mert -d 13 --scconfig case:true --ffile run1.features.dat,run2.features.dat --scfile run1.scores.dat,run2.scores.dat --ifile run2.init.opt -n 20 Executing: /home/dimbaba/mosesdecoder/mert/mert -d 13 --scconfig case:true --ffile run1.features.dat,run2.features.dat --scfile run1.scores.dat,run2.scores.dat --ifile run2.init.opt -n 20 > mert.out 2> mert.log Executing: \cp -f extract.err run2.extract.err Executing: \cp -f extract.out run2.extract.out Executing: \cp -f mert.out run2.mert.out Executing: \cp -f mert.log run2.mert.log Executing: touch mert.log run2.mert.log Executing: \cp -f weights.txt run2.weights.txt None of the weights changed more than 1e-05. Stopping. Executing: \cp -f init.opt run2.init.opt Executing: \cp -f mert.log run2.mert.log Saved: ./moses.ini run 1 start at Δευ 04 Ιούν 2012 07:43:29 μμ EEST Parsing --decoder-flags: || Saving new config to: ./run1.moses.ini (1) run decoder to produce n-best lists params = decoder_config = -w -0.243902 -lm 0.121951 -d 0.146341 -tm 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 run 1 end at Τρι 05 Ιούν 2012 03:14:03 μμ EEST (1) BEST at 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Τρι 05 Ιούν 2012 03:14:03 μμ EEST loading data from 1 to 1 (prev_aggregate_nbl_size=-1) loading data from run1.features.dat loading data from run1.scores.dat loading data from run1.init.opt run 2 start at Τρι 05 Ιούν 2012 03:14:03 μμ EEST Parsing --decoder-flags: || Saving new config to: ./run2.moses.ini (2) run decoder to produce n-best lists params = decoder_config = -w 0.000000 -lm 0.000000 -d 0.000000 -tm 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 run 2 end at Παρ 08 Ιούν 2012 11:03:38 μμ EEST (2) BEST at 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Παρ 08 Ιούν 2012 11:03:38 μμ EEST Training finished at Παρ 08 Ιούν 2012 11:03:38 μμ EEST the intermediate BLEU scores : 0 50 0 49 0 48 0 47 46 0 49 0 48 0 47 0 46 46 0 50 0 49 0 48 0 47 46 0 49 0 48 0 47 0 46 46 . . . 0 52 0 51 0 50 0 49 43 0 52 0 51 0 50 0 49 43 0 52 0 51 0 50 0 49 43 0 52 0 51 0 50 0 49 43 0 52 0 51 0 50 0 49 43 0 52 0 51 0 50 0 49 43 0 54 0 53 0 52 0 51 43 0 53 0 52 0 51 0 50 43 0 52 0 51 0 50 0 49 43 0 53 0 52 0 51 0 50 43 0 52 0 51 0 50 0 49 43 Thank you DB _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
