Hi, the hierarchical mode requires a different binarizer.
Typically, you would specify -Binarizer "/path/to/moses/bin/CreateOnDiskPt 1 1 5 100 2" -phi On Fri, Mar 29, 2013 at 3:26 PM, Ken Fasano <kenfas...@hotmail.com> wrote: > > > Thank you, > > Ken Fasano > > Begin forwarded message: > > *From:* "Ken Fasano" <kfas...@motionpoint.com> > *Date:* March 29, 2013, 11:25:05 AM EDT > *To:* <kenfas...@hotmail.com> > *Subject:* *I am having problems with binarizing in a hierarchical system > (but not a phrase-based system).* > > I am having problems with *binarizing* in a *hierarchical* system (but > not a phrase-based**** > > system). I can run the hierarchical without binarization; decoding and > getting **** > > a BLEU score are both fine. With filtering and binarizing, however, I am > not able**** > > to get past filtering without an error - it's all downhill from there, and > perhaps**** > > I am losing track of which moses.ini is which.**** > > ** ** > > *Thank you so much.* I have managed to get a lot done, but this one seems > a lot more**** > > daunting than it really is. *I'm thinking it's just some missing > parameter or a '1'* > > *when I should have a '0'...I’ve underlined where I think the problems > are.* > > ** ** > > (1) Tuning the system and filtering. *This works:* > > ** ** > > * $MOSES_SCRIPTS/training/filter-model-given-input.pl \* > > * "$TRAINING_DIR/filtered" \* > > * "$TRAINING_DIR/model/moses.ini" \* > > * "$TUNING_CORPUS.tok.$F" \* > > * -Hierarchical* > > ** ** > > (2) Filtering on the way to a BLEU score, following > http://www.statmt.org/moses/?n=Moses.Baseline. *This works, too:* > > ** ** > > * $MOSES_SCRIPTS/training/filter-model-given-input.pl \* > > * $WORKING_FILTERED \* > > * $TRAINING_DIR/model/moses-tuned.ini \* > > * $HOME/corpus/dev/news2011.true.$F \* > > * -Hierarchical* > > ** ** > > (3) Filtering AND binarizing during tuning fails. I assume this is the > preferred**** > > method, since it is done with one command:**** > > ** ** > > * # filter the phrase table. * > > * # This changes moses.ini so that the binary files are used. > CORRECT???* > > * # [ttable-file]* > > * # 2 0 0 5 > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1.bin* > > * # This will fail:* > > * # ERROR: unknown option > '/home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1'* > > * $MOSES_SCRIPTS/training/filter-model-given-input.pl \* > > * "$TRAINING_DIR/filtered" \* > > * "$TRAINING_DIR/model/moses.ini" \* > > * "$TUNING_CORPUS.tok.$F" \* > > * -Hierarchical \* > > * -Binarizer $MOSES_BIN/processPhraseTable* > > * * > > * # run MERT* > > * time nice $MOSES_SCRIPTS/training/mert-moses.pl \* > > * "$TUNING_CORPUS.true.$F" \* > > * "$TUNING_CORPUS.true.$E" \* > > * "$MOSES_BIN/moses_chart" \* > > * "$TRAINING_DIR/filtered/moses.ini" \* > > * --no-filter-phrase-table \* > > * --working-dir "$TRAINING_DIR/mert" \* > > * --mertdir "$MOSES_BIN" \* > > * --decoder-flags "-threads $threads -v 0"* > > * &> "$TRAINING_DIR/mert.out"* > > * * > > * # Insert weights into configuration file.* > > * $MOSES_SCRIPTS/ems/support/reuse-weights.perl \* > > * "$TRAINING_DIR/mert/moses.ini" \* > > * < "$TRAINING_DIR/model/moses.ini" \* > > * > "$TRAINING_DIR/model/moses-tuned.ini"* > > * * > > * # update "$TRAINING_DIR/model/moses-tuned.ini" to read from rules > folder* > > * # IS THIS NEEDED? Looks like filter-model-given-input.pl has > already done it for me!* > > * $HOME/scripts/binarizeMoses.perl \* > > * "moses-tuned.ini" \* > > * "$TRAINING_DIR/model" \* > > * "$TRAINING_DIR/model"***** > > ** ** > > binarizeMoses.perl is basically this:**** > > } elsif (/^6 0 0 5/) { > > # hierarchical (and others)**** > > s/6 0 0 5/2 0 0 5/;**** > > s/rule-table\.gz/rule-table/;**** > > ** ** > > filtered/phrase-table.0-0.1.1 exists and has data:**** > > *-rw-rw-r-- 1 kenny kenny 111719 Mar 29 08:35 > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1* > > ** ** > > Tuning proceeds as follows:**** > > ****************************** CONSOLE OUTPUT***************************** > * > > *Starting tuning...* > > *Tokenizer Version 1.1* > > *Language: de* > > *Number of threads: 1* > > *Tokenizer Version 1.1* > > *Language: en* > > *Number of threads: 1* > > * * > > * **** MOSES.INI BEFORE FILTER **** * > > *[ttable-file]* > > *6 0 0 5** /home/kenny/working/ebay-terms/hier/model/rule-table.gz* > > *6 0 0 1 /home/kenny/working/ebay-terms/hier/model/glue-grammar* > > * * > > *# translation model weights* > > *[weight-t]* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *1.0* > > * **** MOSES.INI BEFORE FILTER **** * > > * * > > *Executing: mkdir -p /home/kenny/working/ebay-terms/hier/filtered* > > *Considering factor 0* > > *Done.* > > *filtering /home/kenny/working/ebay-terms/hier/model/rule-table.gz -> > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1...* > > *binarizing.../home/kenny/mosesdecoder/bin/processPhraseTable > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1 > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1.bin* > > *ERROR: unknown option > '/home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1'* > > *To run the decoder, please call:* > > * moses -f /home/kenny/working/ebay-terms/hier/filtered/moses.ini -i > /home/kenny/corpus/dev/news-test2008.tok.de* > > * * > > * **** MOSES.INI AFTER FILTER **** * > > *[ttable-file]* > > *2 0 0 5 > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1.bin* > > * * > > *[n.b. -rw-rw-r-- 1 kenny kenny 111719 Mar 29 09:51 phrase-table.0-0.1.1]* > > * * > > *6 0 0 1 /home/kenny/working/ebay-terms/hier/model/glue-grammar* > > * * > > *# translation model weights* > > *[weight-t]* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *1.0* > > * **** MOSES.INI AFTER FILTER **** * > > * * > > *Using SCRIPTS_ROOTDIR: /home/kenny/mosesdecoder/scripts* > > *Asking moses for feature names and values from > /home/kenny/working/ebay-terms/hier/filtered/moses.ini* > > *Executing: /home/kenny/mosesdecoder/bin/moses_chart -threads 2 -v 0 > -config /home/kenny/working/ebay-terms/hier/filtered/moses.ini -inputtype > 0 -show-weights > ./features.list* > > */home/kenny/mosesdecoder/bin* > > *max-chart-span: 20* > > *max-chart-span: 1000* > > *Start loading text SCFG phrase table. Moses format : [0.001] seconds* > > *MERT starting values and ranges for random generation:* > > * lm = 0.500 ( 0.00 .. 1.00)* > > * tm = 0.200 ( 0.00 .. 1.00)* > > * tm = 0.200 ( 0.00 .. 1.00)* > > * tm = 0.200 ( 0.00 .. 1.00)* > > * tm = 0.200 ( 0.00 .. 1.00)* > > * tm = 0.200 ( 0.00 .. 1.00)* > > * tm = 1.000 ( 0.00 .. 1.00)* > > * w = -1.000 ( 0.00 .. 1.00)* > > *run 1 start at Fri Mar 29 09:51:20 EDT 2013* > > *Parsing --decoder-flags: |-threads 2 -v 0|* > > *Saving new config to: ./run1.moses.ini* > > *Saved: ./run1.moses.ini* > > *(1) run decoder to produce n-best lists* > > *params = -threads 2 -v 0* > > *Normalizing lambdas: 0.500000 0.200000 0.200000 0.200000 0.200000 > 0.200000 1.000000 -1.000000* > > *DECODER_CFG = -w -0.285714 -lm 0.142857 -tm 0.057143 0.057143 0.057143 > 0.057143 0.057143 0.285714* > > *decoder_config = -w -0.285714 -lm 0.142857 -tm 0.057143 0.057143 > 0.057143 0.057143 0.057143 0.285714* > > *Executing: /home/kenny/mosesdecoder/bin/moses_chart -threads 2 -v 0 > -config /home/kenny/working/ebay-terms/hier/filtered/moses.ini -inputtype 0 > -w -0.285714 -lm 0.142857 -tm 0.057143 0.057143 0.057143 0.057143 0.057143 > 0.285714 -n-best-list run1.best100.out 100 -input-file > /home/kenny/corpus/dev/news-test2008.true.de > run1.out* > > */home/kenny/mosesdecoder/bin* > > *max-chart-span: 20* > > *max-chart-span: 1000* > > *Start loading text SCFG phrase table. Moses format : [0.001] seconds* > > *Start loading binary SCFG phrase table. : [0.009] seconds* > > *Check m_fileSource.is_open() failed in OnDiskPt/OnDiskWrapper.cpp:61* > > *Start loading binary SCFG phrase table. Aborted (core dumped)* > > *Exit code: 134* > > *The decoder died.** CONFIG WAS -w -0.285714 -lm 0.142857 -tm 0.057143 > 0.057143 0.057143 0.057143 0.057143 0.285714 * > > * * > > *real 0m0.168s* > > *user 0m0.048s* > > *sys 0m0.000s* > > * * > > * **** MOSES.INI AFTER MERT **** * > > *[ttable-file]* > > *2 0 0 5 > /home/kenny/working/ebay-terms/hier/filtered/phrase-table.0-0.1.1.bin* > > *6 0 0 1 /home/kenny/working/ebay-terms/hier/model/glue-grammar* > > * * > > *# translation model weights* > > *[weight-t]* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *0.20* > > *1.0* > > * **** MOSES.INI AFTER MERT **** * > > *ERROR: could not open weight file: > /home/kenny/working/ebay-terms/hier/mert/moses.ini at > /home/kenny/mosesdecoder/scripts/ems/support/reuse-weights.perl line 15.* > > * * > > * **** MOSES_TUNED.INI AFTER MERT **** * > > *cat: /home/kenny/working/ebay-terms/hier/model/filtered/moses-tuned.ini: > No such file or directory* > > * **** MOSES_TUNED.INI AFTER MERT **** * > > *TUNING FAILED!* > > ****************************** END CONSOLE OUTPUT***************************** > * > > * * > > Here's the rule-table directory after all that. If I try to decode, I get > nothing.**** > > ** ** > > * -rw-rw-r-- 1 kenny kenny 77 Mar 29 08:24 Misc.dat* > > * -rw-rw-r-- 1 kenny kenny 21 Mar 29 08:24 Source.dat* > > * -rw-rw-r-- 1 kenny kenny 9 Mar 29 08:24 TargetColl.dat* > > * -rw-rw-r-- 1 kenny kenny 1 Mar 29 08:24 TargetInd.dat* > > * -rw-rw-r-- 1 kenny kenny 0 Mar 29 08:24 Vocab.dat* > > ** ** > > (4) Instead, I try binarizing myself. All the files exist. This fails:**** > > ** ** > > * $MOSES_BIN/CreateOnDiskPt 1 1 5 100 2 \* > > * "$TRAINING_DIR/filtered/rule-table.gz" \* > > * "$TRAINING_DIR/model/rule-table"* > > ** ** > > This step is run after tuning. Before this, everything > works. This binarization**** > > takes almost no time (0.000455144 seconds), and produces a > rule-table directory**** > > with empty or nearly empty files. There are no error > messages, but if I try to decode,**** > > I get nothing.**** > > ** ** > > * -rw-rw-r-- 1 kenny kenny 77 Mar 29 08:24 Misc.dat* > > * -rw-rw-r-- 1 kenny kenny 21 Mar 29 08:24 Source.dat* > > * -rw-rw-r-- 1 kenny kenny 9 Mar 29 08:24 TargetColl.dat* > > * -rw-rw-r-- 1 kenny kenny 1 Mar 29 08:24 TargetInd.dat* > > * -rw-rw-r-- 1 kenny kenny 0 Mar 29 08:24 Vocab.dat* > > **** > > (5) With the non-binarized system from #1 (unbinarized), I try binarizing > on my way to getting a BLEU score. This fails, too:**** > > ** ** > > * $MOSES_SCRIPTS/training/filter-model-given-input.pl \* > > * $WORKING_FILTERED \* > > * $TRAINING_DIR/model/moses-tuned.ini \* > > * $HOME/corpus/dev/news2011.true.$F \* > > * -Hierarchical \* > > * -Binarizer $MOSES_BIN/processPhraseTable* > > ** ** > > ****************************** CONSOLE OUTPUT***************************** > * > > *ERROR: unknown option > '/home/kenny/working/dev/newstest2011/filtered/hier/phrase-table.0-0.1.1'* > > *[ n.b. this is the same error I got when I try to binarize during > filtering, above. Hmm… ]* > > * * > > *To run the decoder, please call:* > > * moses -f /home/kenny/working/dev/newstest2011/filtered/hier/moses.ini > -i /home/kenny/corpus/dev/newstest2011.true.de* > > *Filtering and binarization complete.* > > * * > > *Starting decoder test (this will take a while)...* > > * * > > *./bleuEbayHier.sh: line 178: 10784 Aborted (core dumped) > nice $DECODER -f $MOSES_INI_FILTERED -i $TEST_SET_PATH.true.$F > > $TRANSLATION_PATH 2> $WORKING/moses.out* > > *Decoding test failed! rc = 134* > > ****************************** CONSOLE OUTPUT***************************** > * > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support