Re: [Moses-support] My phrase-table.tgz is 20-bytes long

Marcin Junczys-Dowmunt Wed, 25 Feb 2015 02:58:22 -0800

Hi,


Running mgiza with 40 cores is a bad idea anyway, there is some heavy
locking going on. Try 8 to 16. It might be much faster. 

W dniu 2015-02-25 11:37, Tom Hoar napisał(a): 

> Alexander,
> 
> If your MGIZA word alignment .gz files are empty, the error is happening 
> in step 2. Errors there aren't trapped and the system continues running. 
> Therefore, the outputs of steps 3 (alignment file), 4 (lex files) & 5 
> (extract files) are all garbage. If the word alignment files are ok and 
> the extract files are missing, you probably ran out of hard drive space, 
> as Barry suggested.
> 
> Running for 10 days on a 40-core configuration is a lot to manage. It 
> sounds like a large corpus. Have you run a successful training session 
> on a sample subset of your data? I would suggest extracting a random 
> sample of ~15,000 pairs and run your configuration with -mgiza-cpus 8 & 
> -cores 8. It should take about 30 minutes to run and you shouldn't have 
> any disk space problems. Work out any bugs in your corpus prep and/or 
> runtime with this smaller subset. Then, scale up to your full-sized 
> corpus. With large corpora that run 10 days, you might need several 
> hundred gigabytes of available space for temp files in your final output 
> folder, i.e. not /tmp.
> 
> On 02/25/2015 05:19 PM, Barry Haddow wrote:
> Hi Alexander, It looks like something went wrong at the extract stage. If you 
> could make your training.out available then we can look for clues. Could the 
> system have run out of disk space, either in the working directory or in 
> /tmp? A lot of space is required to build the extract files and phrase 
> tables. cheers - Barry On 25/02/15 05:32, Александр Паньшин wrote: Ok, I've 
> started from scratch. I'm pretty sure that I worked with corpus such a way: 
> 1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers of 
> lines caused any errors and warnings 2. Deleted these lines from both files 
> using sed 3. Tokenized the files again. No errors 5. Created truecase-model 
> and truecases the files. 6. Deleted too long lines by using 
> clean-corpus-n.perl 1 50 Started translation model creation process by: nohup 
> nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza 
> -mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru 
> -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm 
0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >& training.out 
& After ten days of waiting I have 20-bytes long phraze-table.tgz again! What 
I'm doing wrong? I have both ru-en and en-ru A3.final.gz files, 
aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but empty 
phrase-table, extract.*.sorted.gz and reordering table. I'm still having no 
idea what and why goes wrong:( 2015-02-14 21:54 GMT+07:00 Kenneth Heafield 
<[email protected] <mailto:[email protected]>>: Sign my petition to add 
return code checking to train-model.perl. On 02/14/2015 09:33 AM, Tom Hoar 
wrote: > An empty phrase-table.gz file is usually the result of an ill-prepared 
> training corpus. Make sure you run the final corpus through > 
clean-corpus-n.perl. > > > > On 02/14/2015 09:19 PM, Александр Паньшин wrote: 
>> Hello, everybody! >> >> I have a problem with moses. I created big parallel 
corpus by >>
concatenating a bunch of existing corpuses on >> http://opus.lingfil.uu.se [1]. 
After that I cleaned up results (while >> creating tokens script reported some 
errors. I deleted error-prone >> rows from both of parts). >> >> Then I started 
to train translation model using mgiza with such an >> executable: >> >> nohup 
nice /opt/moses/scripts/training/train-model.perl --parallel >> -mgiza 
-mgiza-cpus 20 -cores 20 -root-dir train -corpus >> ~/corpus/ru-en.clean -f ru 
-e en -alignment grow-diag-final-and >> -reordering msd-bidirectional-fe -lm 
0:3:$HOME/lm/ru-en.arpa.en:8 >> -external-bin-dir /opt/moses/mgiza >& 
training.out & >> >> After a week of work I have this in the end of 
training.out: >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015 >> 
(7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK 2015 >> 
(7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015 >> Executing: 
/opt/moses/scripts/../bin/lexical-reordering-score >>
/home/adminadmin/working/train/model/extract.o.sorted.gz 0.5 >> 
/home/adminadmin/working/train/model/reordering-table. --model "wbe >> msd 
wbe-msd-bidirectional-fe" >> Lexical Reordering Scorer >> scores lexical 
reordering models of several types (hierarchical, >> phrase-based and 
word-based-extraction >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 
2015 >> no generation model requested, skipping step >> (9) create moses.ini @ 
Sun Feb 8 15:30:35 MSK 2015 >> >> There is a bunch of files in ~/working/train 
folder. Looks like >> everything is ok, except the tiny problem: 
phrase-table.tgz has size >> of 20 bytes. And, of course, it's not usable at 
all! >> >> Can somebody help and give me a direction where to dig? >> >> >> 
_______________________________________________ >> Moses-support mailing list 
>> [email protected] <mailto:[email protected]> >> 
http://mailman.mit.edu/mailman/listinfo/moses-support [2] > > > > 
_______________________________________________ >
Moses-support mailing list > [email protected] 
<mailto:[email protected]> > 
http://mailman.mit.edu/mailman/listinfo/moses-support [2] > 
_______________________________________________ Moses-support mailing list 
[email protected] <mailto:[email protected]> 
http://mailman.mit.edu/mailman/listinfo/moses-support [2] 
_______________________________________________ Moses-support mailing list 
[email protected] http://mailman.mit.edu/mailman/listinfo/moses-support [2]

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support [2]

 

Links:
------
[1] http://opus.lingfil.uu.se
[2] http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] My phrase-table.tgz is 20-bytes long

Reply via email to