Hi Alexander,

It looks like something went wrong at the extract stage. If you could 
make your training.out available then we can look for clues.

Could the system have run out of disk space, either in the working 
directory or in /tmp? A lot of space is required to build the extract 
files and phrase tables.

cheers - Barry

On 25/02/15 05:32, Александр Паньшин wrote:
> Ok, I've started from scratch. I'm pretty sure that I worked with 
> corpus such a way:
>
> 1. I tokenized the initial corpuses with tokenizer.perl. Learned 
> numbers of lines caused any errors and warnings
> 2. Deleted these lines from both files using sed
> 3. Tokenized the files again. No errors
> 5. Created truecase-model and truecases the files.
> 6. Deleted too long lines by using clean-corpus-n.perl 1 50
>
> Started translation model creation process by:
>
>  nohup nice /opt/moses/scripts/training/train-model.perl --parallel 
> -mgiza -mgiza-cpus 40 -cores 40 -root-dir train -corpus 
> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and 
> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8 
> -external-bin-dir /opt/moses/mgiza >& training.out &
>
> After ten days of waiting I have 20-bytes long phraze-table.tgz again! 
> What I'm doing wrong?
>
> I have both ru-en and en-ru A3.final.gz files, 
> aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but 
> empty phrase-table, extract.*.sorted.gz and reordering table.
>
> I'm still having no idea what and why goes wrong:(
>
> 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <[email protected] 
> <mailto:[email protected]>>:
>
>     Sign my petition to add return code checking to train-model.perl.
>
>     On 02/14/2015 09:33 AM, Tom Hoar wrote:
>     > An empty phrase-table.gz file is usually the result of an
>     ill-prepared
>     > training corpus. Make sure you run the final corpus through
>     > clean-corpus-n.perl.
>     >
>     >
>     >
>     > On 02/14/2015 09:19 PM, Александр Паньшин wrote:
>     >> Hello, everybody!
>     >>
>     >> I have a problem with moses. I created big parallel corpus by
>     >> concatenating a bunch of existing corpuses on
>     >> http://opus.lingfil.uu.se. After that I cleaned up results (while
>     >> creating tokens script reported some errors. I deleted error-prone
>     >> rows from both of parts).
>     >>
>     >> Then I started to train translation model using mgiza with such an
>     >> executable:
>     >>
>     >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>     >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>     >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>     >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>     >> -external-bin-dir /opt/moses/mgiza >& training.out &
>     >>
>     >> After a week of work I have this in the end of training.out:
>     >> (7) learn reordering model @ Sun Feb  8 15:30:35 MSK 2015
>     >> (7.1) [no factors] learn reordering model @ Sun Feb  8 15:30:35
>     MSK 2015
>     >> (7.2) building tables @ Sun Feb  8 15:30:35 MSK 2015
>     >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>     >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>     >> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>     >> msd wbe-msd-bidirectional-fe"
>     >> Lexical Reordering Scorer
>     >> scores lexical reordering models of several types (hierarchical,
>     >> phrase-based and word-based-extraction
>     >> (8) learn generation model @ Sun Feb  8 15:30:35 MSK 2015
>     >>   no generation model requested, skipping step
>     >> (9) create moses.ini @ Sun Feb  8 15:30:35 MSK 2015
>     >>
>     >> There is a bunch of files in ~/working/train folder. Looks like
>     >> everything is ok, except the tiny problem: phrase-table.tgz has
>     size
>     >> of 20 bytes. And, of course, it's not usable at all!
>     >>
>     >> Can somebody help and give me a direction where to dig?
>     >>
>     >>
>     >> _______________________________________________
>     >> Moses-support mailing list
>     >> [email protected] <mailto:[email protected]>
>     >> http://mailman.mit.edu/mailman/listinfo/moses-support
>     >
>     >
>     >
>     > _______________________________________________
>     > Moses-support mailing list
>     > [email protected] <mailto:[email protected]>
>     > http://mailman.mit.edu/mailman/listinfo/moses-support
>     >
>     _______________________________________________
>     Moses-support mailing list
>     [email protected] <mailto:[email protected]>
>     http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to