Re: [Moses-support] Problem tuning hierarchical models on Cluster

Noubours, Sandra Thu, 26 Jan 2012 02:16:49 -0800

Hi Hieu!


I commented out line 593 of moses-parallel.pl and changed the following
line by removing the $tmpStartTranslationId var and everything works
fine now!

 

Thank you!

 

Sandra

 

# my $tmpStartTranslationId = "-start-translation-id
$currStartTranslationId"; 

print OUT "$mosescmd $mosesparameters $tmpalioutfile $tmpwordgraphlist
$tmpsearchgraphlist $tmpnbestlist $inputmethod
${inputfile}.$splitpfx$idx >
$tmpdir/${inputfile}.$splitpfx$idx.trans\n\n"; 

 

Von: Hieu Hoang [mailto:[email protected]] 
Gesendet: Donnerstag, 26. Januar 2012 05:01
An: Noubours, Sandra
Cc: [email protected]
Betreff: Re: [Moses-support] Problem tuning hierarchical models on
Cluster

 

hi sandra

i added the 
   -start-translation-id
 argument to the script
  moses-parallel.pl line 593

however, I must admit, i didn't test it on an SGE cluster and assumed it
would work. It's not important, it's only used when the phrase table
needs to be loaded per sentence, typically when using suffix array. This
has not been fully implemented.

 If you delete it and it works for you, please tell me and I'll roll
back

apologies

On Wed, Jan 25, 2012 at 4:15 PM, Noubours, Sandra
<[email protected]> wrote:

Hello,

 

I encountered some problem when tuning hierarchical models using
SunGrid.

The step TUNING_tune crashes and the error log tells me that file splits
-ab, -ac, etc. have not been entirely translated:

 

---------------------->

...

Split (-ab) were not entirely translated

outputN=0 inputN=84

outputfile=input.tok.1.split4926-ab.trans
inputfile=input.tok.1.split4926-ab

Split (-ac) were not entirely translated

outputN=0 inputN=84

...

Executing: qdel 717048

Exit code: 1

Translation was not performed correctly

or some of the submitted jobs died.

qdel function was called for all submitted jobs

Exit code: 1

The decoder died. CONFIG WAS -w -0.285714 -lm 0.142857 -tm 0.057143
0.057143 0.057143 0.057143 0.057143 0.285714

...

<----------------------

 

I saw that the first split is translated correctly (-aa) and everything
is fine but then, from the second split on, the generated files are
empty, i.e.:

 

---------------------->

.../tuning/tmp.1/ tmp4926/run1.best100.out.split4926-ab

.../tuning/tmp.1/tmp4926/input.tok.1.split4926-ab.trans

...

<----------------------

The files generated from the first split (-aa) are ok.

 

The logfile just tells me that the job has been submitted and the file
out.job.4926-ab shows me that the decoder worked fine and translated
everything :

---------------------->

Linux tyr 2.6.37.6-0.7-default #1 SMP 2011-07-21 02:17:24 +0200 x86_64
x86_64 x86_64 GNU/Linux

ulimit: Command not found.

Defined parameters (per moses.ini or switch):

                config: /smt-work/tuning/moses.filtered.ini.1 

                cube-pruning-pop-limit: 1000 

                input-factors: 0 

                input-file: input.tok.1.split16649-aa 

                inputtype: 0 

                lmodel-file: 1 0 5 /smt-work/lm/prsde.binlm.1 

                mapping: 0 T 0 1 T 1 

                max-chart-span: 20 1000 

                n-best-list:
/smt-work/tuning/tmp.1/tmp16649/run1.best100.out.split16649-aa 100 

                non-terminals: X 

                search-algorithm: 3 

                start-translation-id: 0 

                ttable-file: 2 0 0 5
/smt-work/tuning/filtered.1/phrase-table.0-0.1.1.bin 6 0 0 1
/smt-work/model/glue-grammar.1 

                ttable-limit: 20 

                weight-l: 0.142857 

                weight-t: 0.057143 0.057143 0.057143 0.057143 0.057143
0.285714 

                weight-w: -0.285714 

Loading lexical distortion models...have 0 models

Start loading LanguageModel /smt-work/lm/prsde.binlm.1 : [0.000] seconds

In LanguageModelIRST::Load: nGramOrder = 5

Language Model Type of /smt-work/lm/prsde.binlm.1 is 1

Qblmt

loadbin()

reading  256 centers

reading  256 centers

reading  256 centers

reading  256 centers

reading  256 centers

lmtable::loadbin_dict()

dict->size(): 260011

loadbin_level (level 1)

loading 260011 1-grams

done (level1)

loadbin_level (level 2)

loading 2194489 2-grams

done (level2)

loadbin_level (level 3)

loading 4658390 3-grams

done (level3)

loadbin_level (level 4)

loading 5850497 4-grams

done (level4)

loadbin_level (level 5)

loading 6015709 5-grams

done (level5)

done

OOV code is 260010

IRST: m_unknownId=260010

Finished loading LanguageModels : [0.000] seconds

Using uniform ttable-limit of 20 for all translation tables.

Start loading PhraseTable
/smt-work/tuning/filtered.1/phrase-table.0-0.1.1.bin : [0.000] seconds

filePath: /smt-work/tuning/filtered.1/phrase-table.0-0.1.1.bin

Start loading PhraseTable /smt-work/model/glue-grammar.1 : [0.000]
seconds

filePath: /smt-work/model/glue-grammar.1

Finished loading phrase tables : [0.000] seconds

Start loading phrase table from /smt-work/model/glue-grammar.1 : [0.000]
seconds

Start loading new format pt model : [0.000] seconds

Finished loading phrase tables : [0.000] seconds

Created input-output object : [0.000] seconds

Translating: <s> I go home </s> 

 

  0   1   2 

  1  20   0 

   19   0 

      1 

BEST TRANSLATION: 44 S </s> :0-0 : pC=0.000, c=-0.573 [0..2] 24
[total=-1.166] <<-1.303, 0.000, -6.626, -6.675, -4.867, -3.650, -1.163,
1.000, 1.000>>

reset caches

Translation took 0.050 seconds

...

End. : [645.000] seconds

reset mmap

exit status 0

exit status 0

exit status 0

<----------------------

 

But as mentioned above the generated translation and n-best files are
empty. 

 

Then I had a look at the starting bash scripts and I saw that it may
have something to do with the option "-start-translation-id" : The bash
script of split -aa is run with "-start-translation-id 0" but the
following splits are run with "-start-translation-id 84",
"-start-translation-id 168", and so on.

The job bash script then looks like this:

---------------------->

/smt/moses/dist/bin/moses_chart  -w -0.285714 -lm 0.142857 -tm 0.057143
0.057143 0.057143 0.057143 0.057143 0.285714 -config
/smt-work/tuning/moses.filtered.ini.1 -inputtype 0 -start-translation-id
84-n-best-list
/smt-work/tuning/tmp.1/tmp4926/run1.best100.out.split4926-ab 100
-input-file /smt-work/tuning/tmp.1/input.tok.1.split4926-ab >
/smt-work/tuning/tmp.1/tmp4926/input.tok.1.split4926-ab.trans  

<----------------------

 

When I run exactly the same script changing "-start-translation-id 84"
to "-start-translation-id 0" everything works fine and the files are
generated. 

I thought about deleting the option "-start-translation-id"  but I fear
that it might be important for the all over tuning on the cluster (when
the corpus file is splitted and then processed in parts). So maybe
something is broken in the "moses_chart" concerning parallel processing
or maybe I made an error when compiling? (When I run experiments for
normal phrase models calling "moses" instead of "moses_chart" and
without using "-hierarchical" everything works fine.)

 

Thanks for your help in advance!

 

Sandra 

 

 

 


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem tuning hierarchical models on Cluster

Reply via email to