Re: [Moses-support] EMS fails on tuning

Δημήτρης Μπαμπανιώτης Sat, 09 Jun 2012 10:33:57 -0700

Στις 09/06/2012 05:48 μμ, ο/η Philipp Koehn έγραψε:
> Hi,
>
> can you check what the files were produced in the tuning tmp
> directory, their sizes and the intermediate BLEU scores are?
> Maybe the decoder crashed, the reference files are missing
> or mismatched, etc.
>
> -phi
>
> On Fri, Jun 8, 2012 at 1:55 PM, Δημήτρης Μπαμπανιώτης
> <[email protected]>  wrote:
>> Στις 30/05/2012 12:41 πμ, ο/η Δημήτρης Μπαμπανιώτης έγραψε:
>>
>>> Στις 28/05/2012 10:01 μμ, ο/η Philipp Koehn έγραψε:
>>>> Hi,
>>>>
>>>> there is a problem here:
>>>>
>>>> # conversion of phrase table into binary on-disk format
>>>> #ttable-binarizer = $moses-bin-dir/processPhraseTable
>>>>
>>>> # conversion of rule table into binary on-disk format
>>>> ttable-binarizer = "$moses-bin-dir/CreateOnDisk 1 1 5 100 2"
>>>>
>>>> You are using the ttable binarizer for the hierarchical/syntax model,
>>>> but you use a phrase-based model.
>>>>
>>>> -phi
>>>>
>>>> On Sun, May 27, 2012 at 11:45 PM, Dimitris Babaniotis
>>>> <[email protected]>    wrote:
>>>>> Hello, I'm trying to run experiments with EMS but the process stops on
>>>>> tuning:tune.
>>>>>
>>>>> Here is the TUNING_tune.stderr file :
>>>>>
>>>>> main::create_extractor_script() called too early to check prototype at
>>>>> /home/dimbaba/moses/scripts/training/mert-moses.pl line 674.
>>>>> Using SCRIPTS_ROOTDIR: /home/dimbaba/moses/scripts
>>>>> Asking moses for feature names and values from
>>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4
>>>>> Executing: /home/dimbaba/moses/dist/bin/moses -v 0 -config
>>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4
>>>>> -inputtype 0 -show-weights>    ./features.list
>>>>> MERT starting values and ranges for random generation:
>>>>> d = 0.600 ( 0.00 .. 1.00)
>>>>> lm = 0.250 ( 0.00 .. 1.00)
>>>>> lm = 0.250 ( 0.00 .. 1.00)
>>>>> w = -1.000 ( 0.00 .. 1.00)
>>>>> tm = 0.200 ( 0.00 .. 1.00)
>>>>> tm = 0.200 ( 0.00 .. 1.00)
>>>>> tm = 0.200 ( 0.00 .. 1.00)
>>>>> tm = 0.200 ( 0.00 .. 1.00)
>>>>> tm = 0.200 ( 0.00 .. 1.00)
>>>>> Saved: ./run1.moses.ini
>>>>> Normalizing lambdas: 0.600000 0.250000 0.250000 -1.000000 0.200000
>>>>> 0.200000
>>>>> 0.200000 0.200000 0.200000
>>>>> DECODER_CFG = -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm
>>>>> 0.064516
>>>>> 0.064516 0.064516 0.064516 0.064516
>>>>> Executing: /home/dimbaba/moses/dist/bin/moses -v 0 -config
>>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4
>>>>> -inputtype 0 -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm 0.064516
>>>>> 0.064516 0.064516 0.064516 0.064516 -n-best-list run1.best100.out 100
>>>>> -input-file /home/dimbaba/mosesFactored/experiment/tuning/input.tc.1>
>>>>> run1.out
>>>>> Translating line 0 in thread id 140471666632448
>>>>> Check (*contextFactor[count-1])[factorType] != NULL failed in
>>>>> moses/src/LM/SRI.cpp:155
>>>>> sh: line 1: 1648 Ακυρώθηκε (core dumped)
>>>>> /home/dimbaba/moses/dist/bin/moses
>>>>> -v 0 -config
>>>>> /home/dimbaba/mosesFactored/experiment/tuning/moses.filtered.ini.4
>>>>> -inputtype 0 -w -0.322581 -lm 0.080645 0.080645 -d 0.193548 -tm 0.064516
>>>>> 0.064516 0.064516 0.064516 0.064516 -n-best-list run1.best100.out 100
>>>>> -input-file /home/dimbaba/mosesFactored/experiment/tuning/input.tc.1>
>>>>> run1.out
>>>>> Exit code: 134
>>>>> The decoder died. CONFIG WAS -w -0.322581 -lm 0.080645 0.080645 -d
>>>>> 0.193548
>>>>> -tm 0.064516 0.064516 0.064516 0.064516 0.064516
>>>>> cp: cannot stat
>>>>> «/home/dimbaba/mosesFactored/experiment/tuning/tmp.4/moses.ini»: Δεν
>>>>> υπάρχει
>>>>> τέτοιο αρχείο ή κατάλογος
>>>>>
>>>>>
>>>>> ...and this is my configuration file:
>>>>>
>>>>>
>>>>> ################################################
>>>>> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
>>>>> ################################################
>>>>>
>>>>> [GENERAL]
>>>>>
>>>>> ### directory in which experiment is run
>>>>> #
>>>>> working-dir = /home/dimbaba/mosesFactored/experiment
>>>>>
>>>>> # specification of the language pair
>>>>> input-extension = de
>>>>> output-extension = el
>>>>> pair-extension = de-el
>>>>>
>>>>> ### directories that contain tools and data
>>>>> #
>>>>> # moses
>>>>> moses-src-dir = /home/dimbaba/moses
>>>>> #
>>>>> # moses binaries
>>>>> moses-bin-dir = $moses-src-dir/dist/bin
>>>>> #
>>>>> # moses scripts
>>>>> moses-script-dir = $moses-src-dir/scripts
>>>>> #
>>>>> # srilm
>>>>> srilm-dir = /home/dimbaba/srilm/bin/i686-m64
>>>>> #
>>>>> # irstlm
>>>>> #irstlm-dir = $moses-src-dir/irstlm/bin
>>>>> #
>>>>> # randlm
>>>>> #randlm-dir = $moses-src-dir/randlm/bin
>>>>> #
>>>>> # data
>>>>> wmt12-data = /home/dimbaba/aligned/el-de
>>>>>
>>>>> ### basic tools
>>>>> #
>>>>> # moses decoder
>>>>> decoder = $moses-bin-dir/moses
>>>>>
>>>>> # conversion of phrase table into binary on-disk format
>>>>> #ttable-binarizer = $moses-bin-dir/processPhraseTable
>>>>>
>>>>> # conversion of rule table into binary on-disk format
>>>>> ttable-binarizer = "$moses-bin-dir/CreateOnDisk 1 1 5 100 2"
>>>>>
>>>>> # tokenizers - comment out if all your data is already tokenized
>>>>> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>>>>> $input-extension"
>>>>> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>>>>> $output-extension"
>>>>>
>>>>> # truecasers - comment out if you do not use the truecaser
>>>>> input-truecaser = $moses-script-dir/recaser/truecase.perl
>>>>> output-truecaser = $moses-script-dir/recaser/truecase.perl
>>>>> detruecaser = $moses-script-dir/recaser/detruecase.perl
>>>>>
>>>>> ### generic parallelizer for cluster and multi-core machines
>>>>> # you may specify a script that allows the parallel execution
>>>>> # parallizable steps (see meta file). you also need specify
>>>>> # the number of jobs (cluster) or cores (multicore)
>>>>> #
>>>>> #generic-parallelizer =
>>>>> $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>> #generic-parallelizer =
>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>
>>>>> ### cluster settings (if run on a cluster machine)
>>>>> # number of jobs to be submitted in parallel
>>>>> #
>>>>> #jobs = 10
>>>>>
>>>>> # arguments to qsub when scheduling a job
>>>>> #qsub-settings = ""
>>>>>
>>>>> # project for priviledges and usage accounting
>>>>> #qsub-project = iccs_smt
>>>>>
>>>>> # memory and time
>>>>> #qsub-memory = 4
>>>>> #qsub-hours = 48
>>>>>
>>>>> ### multi-core settings
>>>>> # when the generic parallelizer is used, the number of cores
>>>>> # specified here
>>>>> cores = 4
>>>>>
>>>>> #################################################################
>>>>> # PARALLEL CORPUS PREPARATION:
>>>>> # create a tokenized, sentence-aligned corpus, ready for training
>>>>>
>>>>> [CORPUS]
>>>>>
>>>>> ### long sentences are filtered out, since they slow down GIZA++
>>>>> # and are a less reliable source of data. set here the maximum
>>>>> # length of a sentence
>>>>> #
>>>>> max-sentence-length = 100
>>>>>
>>>>> [CORPUS:europarl] IGNORE
>>>>>
>>>>> ### command to run to get raw corpus files
>>>>> #
>>>>> # get-corpus-script =
>>>>>
>>>>> ### raw corpus files (untokenized, but sentence aligned)
>>>>> #
>>>>> raw-stem = $wmt12-data/training/training.clean10
>>>>>
>>>>> ### tokenized corpus files (may contain long sentences)
>>>>> #
>>>>> #tokenized-stem =
>>>>>
>>>>> ### if sentence filtering should be skipped,
>>>>> # point to the clean training data
>>>>> #
>>>>> #clean-stem =
>>>>>
>>>>> ### if corpus preparation should be skipped,
>>>>> # point to the prepared training data
>>>>> #
>>>>> #lowercased-stem =
>>>>>
>>>>> [CORPUS:nc]
>>>>> raw-stem = $wmt12-data/training/training.clean10
>>>>>
>>>>> [CORPUS:un] IGNORE
>>>>> raw-stem = $wmt12-data/training/training.clean10
>>>>>
>>>>> #################################################################
>>>>> # LANGUAGE MODEL TRAINING
>>>>>
>>>>> [LM]
>>>>>
>>>>> ### tool to be used for language model training
>>>>> # srilm
>>>>> lm-training = $srilm-dir/ngram-count
>>>>> settings = ""
>>>>>
>>>>> # irstlm
>>>>> #lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores
>>>>> $cores
>>>>> -irst-dir $irstlm-dir -temp-dir $working-dir/lm"
>>>>> #settings = ""
>>>>>
>>>>> # order of the language model
>>>>> order = 3
>>>>>
>>>>> ### tool to be used for training randomized language model from scratch
>>>>> # (more commonly, a SRILM is trained)
>>>>> #
>>>>> #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>>>>
>>>>> ### script to use for binary table format for irstlm or kenlm
>>>>> # (default: no binarization)
>>>>>
>>>>> # irstlm
>>>>> #lm-binarizer = $irstlm-dir/compile-lm
>>>>>
>>>>> # kenlm, also set type to 8
>>>>> #lm-binarizer = $moses-bin-dir/build_binary
>>>>> #type = 8
>>>>>
>>>>> ### script to create quantized language model format (irstlm)
>>>>> # (default: no quantization)
>>>>> #
>>>>> #lm-quantizer = $irstlm-dir/quantize-lm
>>>>>
>>>>> ### script to use for converting into randomized table format
>>>>> # (default: no randomization)
>>>>> #
>>>>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>>>>
>>>>> ### each language model to be used has its own section here
>>>>>
>>>>> [LM:europarl] IGNORE
>>>>>
>>>>> ### command to run to get raw corpus files
>>>>> #
>>>>> #get-corpus-script = ""
>>>>>
>>>>> ### raw corpus (untokenized)
>>>>> #
>>>>> raw-corpus = $wmt12-data/training/training.clean.$output-extension
>>>>>
>>>>> ### tokenized corpus files (may contain long sentences)
>>>>> #
>>>>> #tokenized-corpus =
>>>>>
>>>>> ### if corpus preparation should be skipped,
>>>>> # point to the prepared language model
>>>>> #
>>>>> #lm =
>>>>>
>>>>> [LM:nc]
>>>>> raw-corpus = $wmt12-data/training/training.clean10.$output-extension
>>>>>
>>>>> [LM:un] IGNORE
>>>>> raw-corpus =
>>>>> $wmt12-data/training/undoc.2000.$pair-extension.$output-extension
>>>>>
>>>>> [LM:news] IGNORE
>>>>> raw-corpus = $wmt12-data/training/news.$output-extension.shuffled
>>>>>
>>>>> [LM:nc=stem]
>>>>> factors = "stem"
>>>>> order = 3
>>>>> settings = ""
>>>>> raw-corpus = $wmt12-data/training/training.clean.$output-extension
>>>>>
>>>>> #################################################################
>>>>> # INTERPOLATING LANGUAGE MODELS
>>>>>
>>>>> [INTERPOLATED-LM] IGNORE
>>>>>
>>>>> # if multiple language models are used, these may be combined
>>>>> # by optimizing perplexity on a tuning set
>>>>> # see, for instance [Koehn and Schwenk, IJCNLP 2008]
>>>>>
>>>>> ### script to interpolate language models
>>>>> # if commented out, no interpolation is performed
>>>>> #
>>>>> script = $moses-script-dir/ems/support/interpolate-lm.perl
>>>>>
>>>>> ### tuning set
>>>>> # you may use the same set that is used for mert tuning (reference set)
>>>>> #
>>>>> tuning-sgm = $wmt12-data/dev/newstest2010-ref.$output-extension.sgm
>>>>> #raw-tuning =
>>>>> #tokenized-tuning =
>>>>> #factored-tuning =
>>>>> #lowercased-tuning =
>>>>> #split-tuning =
>>>>>
>>>>> ### group language models for hierarchical interpolation
>>>>> # (flat interpolation is limited to 10 language models)
>>>>> #group = "first,second fourth,fifth"
>>>>>
>>>>> ### script to use for binary table format for irstlm or kenlm
>>>>> # (default: no binarization)
>>>>>
>>>>> # irstlm
>>>>> #lm-binarizer = $irstlm-dir/compile-lm
>>>>>
>>>>> # kenlm, also set type to 8
>>>>> #lm-binarizer = $moses-bin-dir/build_binary
>>>>> #type = 8
>>>>>
>>>>> ### script to create quantized language model format (irstlm)
>>>>> # (default: no quantization)
>>>>> #
>>>>> #lm-quantizer = $irstlm-dir/quantize-lm
>>>>>
>>>>> ### script to use for converting into randomized table format
>>>>> # (default: no randomization)
>>>>> #
>>>>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>>>>
>>>>> #################################################################
>>>>> # FACTOR DEFINITION
>>>>>
>>>>> [INPUT-FACTOR]
>>>>>
>>>>> # also used for output factors
>>>>> temp-dir = $working-dir/training/factor
>>>>> [INPUT-FACTOR:stem]
>>>>>
>>>>> factor-script =
>>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl
>>>>> 3"
>>>>> ### script that generates this factor
>>>>> #
>>>>> #mxpost = /home/pkoehn/bin/mxpost
>>>>> factor-script =
>>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl
>>>>> 3"
>>>>> [OUTPUT-FACTOR:stem]
>>>>>
>>>>> factor-script =
>>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl
>>>>> 3"
>>>>> ### script that generates this factor
>>>>> #
>>>>> #mxpost = /home/pkoehn/bin/mxpost
>>>>> factor-script =
>>>>> "$moses-script-dir/training/wrappers/make-factor-stem.perl
>>>>> 3"
>>>>>
>>>>> #################################################################
>>>>> # TRANSLATION MODEL TRAINING
>>>>>
>>>>> [TRAINING]
>>>>>
>>>>> ### training script to be used: either a legacy script or
>>>>> # current moses training script (default)
>>>>> #
>>>>> script = $moses-script-dir/training/train-model.perl
>>>>>
>>>>> ### general options
>>>>> # these are options that are passed on to train-model.perl, for instance
>>>>> # * "-mgiza -mgiza-cpus 8" to use mgiza instead of giza
>>>>> # * "-sort-buffer-size 8G" to reduce on-disk sorting
>>>>> #
>>>>> #training-options = ""
>>>>>
>>>>> ### factored training: specify here which factors used
>>>>> # if none specified, single factor training is assumed
>>>>> # (one translation step, surface to surface)
>>>>> #
>>>>> input-factors = word stem
>>>>> output-factors = word stem
>>>>> alignment-factors = "stem ->    stem"
>>>>> translation-factors = "word ->    word"
>>>>> reordering-factors = "word ->    word"
>>>>> #generation-factors =
>>>>> decoding-steps = "t0"
>>>>>
>>>>> ### parallelization of data preparation step
>>>>> # the two directions of the data preparation can be run in parallel
>>>>> # comment out if not needed
>>>>> #
>>>>> parallel = yes
>>>>>
>>>>> ### pre-computation for giza++
>>>>> # giza++ has a more efficient data structure that needs to be
>>>>> # initialized with snt2cooc. if run in parallel, this may reduces
>>>>> # memory requirements. set here the number of parts
>>>>> #
>>>>> #run-giza-in-parts = 5
>>>>>
>>>>> ### symmetrization method to obtain word alignments from giza output
>>>>> # (commonly used: grow-diag-final-and)
>>>>> #
>>>>> alignment-symmetrization-method = grow-diag-final-and
>>>>>
>>>>> ### use of berkeley aligner for word alignment
>>>>> #
>>>>> #use-berkeley = true
>>>>> #alignment-symmetrization-method = berkeley
>>>>> #berkeley-train = $moses-script-dir/ems/support/berkeley-train.sh
>>>>> #berkeley-process = $moses-script-dir/ems/support/berkeley-process.sh
>>>>> #berkeley-jar = /your/path/to/berkeleyaligner-1.1/berkeleyaligner.jar
>>>>> #berkeley-java-options = "-server -mx30000m -ea"
>>>>> #berkeley-training-options = "-Main.iters 5 5 -EMWordAligner.numThreads
>>>>> 8"
>>>>> #berkeley-process-options = "-EMWordAligner.numThreads 8"
>>>>> #berkeley-posterior = 0.5
>>>>>
>>>>> ### if word alignment should be skipped,
>>>>> # point to word alignment files
>>>>> #
>>>>> #word-alignment = $working-dir/model/aligned.1
>>>>>
>>>>> ### create a bilingual concordancer for the model
>>>>> #
>>>>> #biconcor = $moses-script-dir/ems/biconcor/biconcor
>>>>>
>>>>> ### lexicalized reordering: specify orientation type
>>>>> # (default: only distance-based reordering model)
>>>>> #
>>>>> lexicalized-reordering = msd-bidirectional-fe
>>>>>
>>>>> ### hierarchical rule set
>>>>> #
>>>>> hierarchical-rule-set = true
>>>>>
>>>>> ### settings for rule extraction
>>>>> #
>>>>> #extract-settings = ""
>>>>>
>>>>> ### unknown word labels (target syntax only)
>>>>> # enables use of unknown word labels during decoding
>>>>> # label file is generated during rule extraction
>>>>> #
>>>>> #use-unknown-word-labels = true
>>>>>
>>>>> ### if phrase extraction should be skipped,
>>>>> # point to stem for extract files
>>>>> #
>>>>> # extracted-phrases =
>>>>>
>>>>> ### settings for rule scoring
>>>>> #
>>>>> score-settings = "--GoodTuring"
>>>>>
>>>>> ### include word alignment in phrase table
>>>>> #
>>>>> #include-word-alignment-in-rules = yes
>>>>>
>>>>> ### if phrase table training should be skipped,
>>>>> # point to phrase translation table
>>>>> #
>>>>> # phrase-translation-table =
>>>>>
>>>>> ### if reordering table training should be skipped,
>>>>> # point to reordering table
>>>>> #
>>>>> # reordering-table =
>>>>>
>>>>> ### if training should be skipped,
>>>>> # point to a configuration file that contains
>>>>> # pointers to all relevant model files
>>>>> #
>>>>> #config-with-reused-weights =
>>>>>
>>>>> #####################################################
>>>>> ### TUNING: finding good weights for model components
>>>>>
>>>>> [TUNING]
>>>>>
>>>>> ### instead of tuning with this setting, old weights may be recycled
>>>>> # specify here an old configuration file with matching weights
>>>>> #
>>>>> #weight-config = $working-dir/tuning/moses.filtered.ini.1
>>>>>
>>>>> ### tuning script to be used
>>>>> #
>>>>> tuning-script = $moses-script-dir/training/mert-moses.pl
>>>>> tuning-settings = "-mertdir $moses-bin-dir --filtercmd
>>>>> '$moses-script-dir/training/filter-model-given-input.pl'"
>>>>>
>>>>> ### specify the corpus used for tuning
>>>>> # it should contain 1000s of sentences
>>>>> #
>>>>> #input-sgm =
>>>>> raw-input = $wmt12-data/tuning/tuning.clean.$input-extension
>>>>> #tokenized-input =
>>>>> #factorized-input =
>>>>> #input =
>>>>> #
>>>>> #reference-sgm =
>>>>> raw-reference = $wmt12-data/tuning/tuning.clean.$output-extension
>>>>> #tokenized-reference =
>>>>> #factorized-reference =
>>>>> #reference =
>>>>>
>>>>> ### size of n-best list used (typically 100)
>>>>> #
>>>>> nbest = 100
>>>>>
>>>>> ### ranges for weights for random initialization
>>>>> # if not specified, the tuning script will use generic ranges
>>>>> # it is not clear, if this matters
>>>>> #
>>>>> # lambda =
>>>>>
>>>>> ### additional flags for the filter script
>>>>> #
>>>>> #filter-settings = "-Binarizer CreateOnDiskPt 1 1 5 100 2 -Hierarchical"
>>>>>
>>>>> ### additional flags for the decoder
>>>>> #
>>>>> decoder-settings = ""
>>>>>
>>>>> ### if tuning should be skipped, specify this here
>>>>> # and also point to a configuration file that contains
>>>>> # pointers to all relevant model files
>>>>> #
>>>>> #config =
>>>>>
>>>>> #########################################################
>>>>> ## RECASER: restore case, this part only trains the model
>>>>>
>>>>> [RECASING]
>>>>>
>>>>> #decoder = $moses-bin-dir/moses
>>>>>
>>>>> ### training data
>>>>> # raw input needs to be still tokenized,
>>>>> # also also tokenized input may be specified
>>>>> #
>>>>> #tokenized = [LM:europarl:tokenized-corpus]
>>>>>
>>>>> # recase-config =
>>>>>
>>>>> #lm-training = $srilm-dir/ngram-count
>>>>>
>>>>> #######################################################
>>>>> ## TRUECASER: train model to truecase corpora and input
>>>>>
>>>>> [TRUECASER]
>>>>>
>>>>> ### script to train truecaser models
>>>>> #
>>>>> trainer = $moses-script-dir/recaser/train-truecaser.perl
>>>>>
>>>>> ### training data
>>>>> # data on which truecaser is trained
>>>>> # if no training data is specified, parallel corpus is used
>>>>> #
>>>>> # raw-stem =
>>>>> # tokenized-stem =
>>>>>
>>>>> ### trained model
>>>>> #
>>>>> # truecase-model =
>>>>>
>>>>> ######################################################################
>>>>> ## EVALUATION: translating a test set using the tuned system and score
>>>>> it
>>>>>
>>>>> [EVALUATION]
>>>>>
>>>>> ### number of jobs (if parallel execution on cluster)
>>>>> #
>>>>> #jobs = 10
>>>>>
>>>>> ### additional flags for the filter script
>>>>> #
>>>>> #filter-settings = ""
>>>>>
>>>>> ### additional decoder settings
>>>>> # switches for the Moses decoder
>>>>> # common choices:
>>>>> # "-threads N" for multi-threading
>>>>> # "-mbr" for MBR decoding
>>>>> # "-drop-unknown" for dropping unknown source words
>>>>> # "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000" for cube
>>>>> pruning
>>>>> #
>>>>> decoder-settings = "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s
>>>>> 5000"
>>>>>
>>>>> ### specify size of n-best list, if produced
>>>>> #
>>>>> #nbest = 100
>>>>>
>>>>> ### multiple reference translations
>>>>> #
>>>>> #multiref = yes
>>>>>
>>>>> ### prepare system output for scoring
>>>>> # this may include detokenization and wrapping output in sgm
>>>>> # (needed for nist-bleu, ter, meteor)
>>>>> #
>>>>> detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l
>>>>> $output-extension"
>>>>> #recaser = $moses-script-dir/recaser/recase.perl
>>>>> wrapping-script = "$moses-script-dir/ems/support/wrap-xml.perl
>>>>> $output-extension"
>>>>> #output-sgm =
>>>>>
>>>>> ### BLEU
>>>>> #
>>>>> nist-bleu = $moses-script-dir/generic/mteval-v13a.pl
>>>>> nist-bleu-c = "$moses-script-dir/generic/mteval-v13a.pl -c"
>>>>> #multi-bleu = $moses-script-dir/generic/multi-bleu.perl
>>>>> #ibm-bleu =
>>>>>
>>>>> ### TER: translation error rate (BBN metric) based on edit distance
>>>>> # not yet integrated
>>>>> #
>>>>> # ter =
>>>>>
>>>>> ### METEOR: gives credit to stem / worknet synonym matches
>>>>> # not yet integrated
>>>>> #
>>>>> # meteor =
>>>>>
>>>>> ### Analysis: carry out various forms of analysis on the output
>>>>> #
>>>>> analysis = $moses-script-dir/ems/support/analysis.perl
>>>>> #
>>>>> # also report on input coverage
>>>>> analyze-coverage = yes
>>>>> #
>>>>> # also report on phrase mappings used
>>>>> report-segmentation = yes
>>>>> #
>>>>> # report precision of translations for each input word, broken down by
>>>>> # count of input word in corpus and model
>>>>> #report-precision-by-coverage = yes
>>>>> #
>>>>> # further precision breakdown by factor
>>>>> #precision-by-coverage-factor = pos
>>>>>
>>>>> [EVALUATION:newstest2011]
>>>>>
>>>>> ### input data
>>>>> #
>>>>> #input-sgm = "$wmt12-data/$input-extension-test.txt"
>>>>> #raw-input = $wmt12-data/$input-extension-test.txt
>>>>> tokenized-input = "$wmt12-data/de-test.txt"
>>>>> # factorized-input =
>>>>> #input = $wmt12-data/$input-extension-test.txt
>>>>>
>>>>> ### reference data
>>>>> #
>>>>> #reference-sgm = "$wmt12-data/$output-extension-test.txt"
>>>>> #raw-reference ="wmt12-data/$output-extension -test.txt
>>>>> tokenized-reference = "$wmt12-data/el-test.txt"
>>>>> #reference = $wmt12-data/el-test.txt
>>>>>
>>>>> ### analysis settings
>>>>> # may contain any of the general evaluation analysis settings
>>>>> # specific setting: base coverage statistics on earlier run
>>>>> #
>>>>> #precision-by-coverage-base = $working-dir/evaluation/test.analysis.5
>>>>>
>>>>> ### wrapping frame
>>>>> # for nist-bleu and other scoring scripts, the output needs to be
>>>>> wrapped
>>>>> # in sgm markup (typically like the input sgm)
>>>>> #
>>>>> wrapping-frame = $tokenized-input
>>>>>
>>>>> ##########################################
>>>>> ### REPORTING: summarize evaluation scores
>>>>>
>>>>> [REPORTING]
>>>>>
>>>>> ### currently no parameters for reporting section
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Dimitris Babaniotis
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>> Hi, thank you for your answer,
>>>
>>> I fixed the problem that you mentioned but the problem still exists.
>>>
>>> I searched more and i found that the error occurs when the decoder tries
>>> to to translate a sentence.
>>> The problem exists with or without EMS.
>>>
>>> Dimitris
>>>
>> Hi,
>>
>> I have a new problem with the moses machine, when the tuning process
>> finished all the weights were zero.
>>
>> Do you know what happened?
>>
>> Here is my configuration file from tuning:
>>
>> # MERT optimized configuration
>> # decoder /home/dimbaba/mosesdecoder/dist/bin/moses
>> # BLEU 0 on dev /home/dimbaba/mosesOnlySuffix/tuning.combined.de
>> # We were before running iteration 2
>> # finished Τρι 05 Ιούν 2012 03:14:03 μμ EEST
>>
>> ### MOSES CONFIG FILE ###
>> #########################
>>
>> # input factors
>> [input-factors]
>> 0
>> 1
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>> 0 T 1
>>
>> # translation tables: table type (hierarchical(0), textual (0), binary (1)),
>> source-factors, target-factors, number of scores, file
>> # OLD FORMAT is still handled for back-compatibility
>> # OLD FORMAT translation tables: source-factors, target-factors, number of
>> scores, file
>> # OLD FORMAT a binary table type (1) is assumed
>> [ttable-file]
>> 0 0 0 5
>> /home/dimbaba/mosesOnlySuffix/work/tuning/mert/filtered/phrase-table.0-0.1.1.gz
>> 0 1 1 5
>> /home/dimbaba/mosesOnlySuffix/work/tuning/mert/filtered/phrase-table.1-1.1.1.gz
>>
>>
>> # no generation models, no generation-file section
>>
>> # language models: type(srilm/irstlm), factors, order, file
>> [lmodel-file]
>> 0 1 3 /home/dimbaba/mosesOnlySuffix/factored.lm
>>
>>
>>
>> # limit on how many phrase translations e for each phrase f are loaded
>> # 0 = all elements loaded
>> [ttable-limit]
>> 20
>> 0
>>
>> # distortion (reordering) weight
>> [weight-d]
>> 0
>>
>> # language model weights
>> [weight-l]
>> 0
>>
>>
>> # translation model weights
>> [weight-t]
>> 0
>> 0
>> 0
>> 0
>> 0
>> 0
>> 0
>> 0
>> 0
>>
>> 0
>>
>> # no generation models, no weight-generation section
>>
>> # word penalty
>> [weight-w]
>> 0
>>
>> [distortion-limit]
>> 6
>>
>> Dimitris Babaniotis
>>
These are the files in tuning folder


filtered
extract.err - 606 b
extract.out - 0
features.list - 307
filterphrases.err - 568
filterphrases.out - 162
finished_step.txt - 2
init.opt - 78
mert.log - 611
mert.out - 0
moses.ini - 1491
run1.best100.out.gz - 2,7 mb
run1.extract.err - 606
run1.extract.out - 0
run1.features.dat - 10,2 mb
run1.init.opt - 170
run1.mert.log - 472
run1.mert.out - 0
run1.moses.ini - 1536
run1.out - 289,143
run1.scores.dat - 2,34 mb
run1.weights.txt - 27
run2.best100.out.gz - 2,5 mb
run2.extract.err - 606
run2.extract.out - 0
run2.features.dat - 10,5 mb
run2.init.opt - 78
run2.mert.log - 611
run2.mert.out - 0
run2.moses.ini - 1428
run2.out - 383 kb
run2.scores.dat - 2,4 mb
run2.weights.txt - 27
weights.txt - 27
extractor.sh - 253


Here are the last lines from the tuning process:


Finished translating
Translating line 999  in thread id 139978645575424
Translating: dadurch|rch wird|ird die|die zukunft|nft eines|nes 
demokratischen|hen und|und sozialen|len europas|pas untergraben|ben .|.

Collecting options took 0.040 seconds
Search took 93.290 seconds
BEST TRANSLATION: με|έσω το|του οηε|κάο που|και αύριο|κού που|νός 
ένα|και δημοκρατία|ίγη κατ|εια κοινωνικής|κής ευρώπης|πης και|εια 
υπονομεύει|πει [11111111111]  [total=0.000] <<-3.000, -13.000, 0.000, 
-98.688, -35.115, -43.861, -52.932, -54.004, 9.999, -51.843, -54.961, 
-78.138, -76.006, 9.999>>
Translation took 93.650 seconds
Finished translating
The decoder returns the scores in this order: d lm w tm tm tm tm tm tm 
tm tm tm tm
Executing: gzip -f run2.best100.out
Scoring the nbestlist.
exec: /home/dimbaba/mosesOnlySuffix/work/tuning/mert/extractor.sh
Executing: /home/dimbaba/mosesOnlySuffix/work/tuning/mert/extractor.sh > 
extract.out 2> extract.err
Executing: \cp -f init.opt run2.init.opt
exec: /home/dimbaba/mosesdecoder/mert/mert -d 13   --scconfig case:true 
--ffile run1.features.dat,run2.features.dat --scfile 
run1.scores.dat,run2.scores.dat --ifile run2.init.opt -n 20
Executing: /home/dimbaba/mosesdecoder/mert/mert -d 13   --scconfig 
case:true --ffile run1.features.dat,run2.features.dat --scfile 
run1.scores.dat,run2.scores.dat --ifile run2.init.opt -n 20 > mert.out 
2> mert.log
Executing: \cp -f extract.err run2.extract.err
Executing: \cp -f extract.out run2.extract.out
Executing: \cp -f mert.out run2.mert.out
Executing: \cp -f mert.log run2.mert.log
Executing: touch mert.log run2.mert.log
Executing: \cp -f weights.txt run2.weights.txt
None of the weights changed more than 1e-05. Stopping.
Executing: \cp -f init.opt run2.init.opt
Executing: \cp -f mert.log run2.mert.log
Saved: ./moses.ini
run 1 start at Δευ 04 Ιούν 2012 07:43:29 μμ EEST
Parsing --decoder-flags: ||
Saving new config to: ./run1.moses.ini
(1) run decoder to produce n-best lists
params =
decoder_config = -w -0.243902 -lm 0.121951 -d 0.146341 -tm 0.048780 
0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 0.048780 
0.048780
run 1 end at Τρι 05 Ιούν 2012 03:14:03 μμ EEST
(1) BEST at 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Τρι 05 Ιούν 2012 
03:14:03 μμ EEST
loading data from 1 to 1 (prev_aggregate_nbl_size=-1)
loading data from run1.features.dat
loading data from run1.scores.dat
loading data from run1.init.opt
run 2 start at Τρι 05 Ιούν 2012 03:14:03 μμ EEST
Parsing --decoder-flags: ||
Saving new config to: ./run2.moses.ini
(2) run decoder to produce n-best lists
params =
decoder_config = -w 0.000000 -lm 0.000000 -d 0.000000 -tm 0.000000 
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 
0.000000
run 2 end at Παρ 08 Ιούν 2012 11:03:38 μμ EEST
(2) BEST at 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Παρ 08 Ιούν 2012 
11:03:38 μμ EEST
Training finished at Παρ 08 Ιούν 2012 11:03:38 μμ EEST


the intermediate BLEU scores :

0 50 0 49 0 48 0 47 46
0 49 0 48 0 47 0 46 46
0 50 0 49 0 48 0 47 46
0 49 0 48 0 47 0 46 46
.
.
.
0 52 0 51 0 50 0 49 43
0 52 0 51 0 50 0 49 43
0 52 0 51 0 50 0 49 43
0 52 0 51 0 50 0 49 43
0 52 0 51 0 50 0 49 43
0 52 0 51 0 50 0 49 43
0 54 0 53 0 52 0 51 43
0 53 0 52 0 51 0 50 43
0 52 0 51 0 50 0 49 43
0 53 0 52 0 51 0 50 43
0 52 0 51 0 50 0 49 43

Thank you

DB



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS fails on tuning

Reply via email to