Re: [Moses-support] My phrase-table.tgz is 20-bytes long

Barry Haddow Thu, 26 Feb 2015 13:31:54 -0800

Hi Alexander

From the error logs, it looks as though alignment went fine, thetraining pipeline reports 24860460 lines of aligned bitext. Since theextract files were empty, I'd suggest that extraction crashed, and themost likely is that it ran out of disk. I'm not sure what happened tothe error messages.

For 25M sentence pairs, the final phrase table could easily be 30G andthe intermediate files are larger. You probably need more like 500G tobe safe.

I would follow Tom's advice and start with a much smaller corpus to seehow the process works. Also, for the full corpus, you could look in tofast_align (https://github.com/clab/fast_align) for alignment as it ismuch faster than mgiza (e.g. 2 days versus 2 weeks), and use EMS forlarge jobs since it's much easier to restart a failed step.


cheers - Barry

On 26/02/15 15:06, Александр Паньшин wrote:


Hi Barry!

Here you can download training.outhttps://www.dropbox.com/s/d0f0n99x4wbw3mo/training.out.gz?dl=1


I have about 50 Gb of free space in working dir.

2015-02-25 17:19 GMT+07:00 Barry Haddow <[email protected]<mailto:[email protected]>>:


    Hi Alexander,

    It looks like something went wrong at the extract stage. If you
    could make your training.out available then we can look for clues.

    Could the system have run out of disk space, either in the working
    directory or in /tmp? A lot of space is required to build the
    extract files and phrase tables.

    cheers - Barry

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] My phrase-table.tgz is 20-bytes long

Reply via email to