Hi,

I'm training a Chinese-to-English phrase-based model, using 33 million
sentence pairs. My phrase table is 90GB gzipped, and the reordering table
is 27GB gzipped. When running processPhraseTableMin, it dies in step 3
because of the following error:

Intermezzo: Calculating Huffman code sets
        Creating Huffman codes for 1786817 target phrase symbols
        Creating Huffman codes for 871265 scores
        Creating Huffman codes for 18018117 scores
        Creating Huffman codes for 827039 scores
        Creating Huffman codes for 17861459 scores
        Creating Huffman codes for 50 alignment points

Pass 3/3: Compressing target phrases
..................................................[5000000]
..................................................[345000000]
............................................terminate called after throwing
an instance of 'util::Exception'
  what():  moses/TranslationModel/CompactPT/ListCoders.h:179 in static void
Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt
= unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception
because `*it > 268435455'.
You are trying to encode 436766721 with Simple9. Cannot encode numbers
larger than 268435455 (2^28-1)
Aborted (core dumped)

Is my phrase table too big? Pruning seems to have only removed 0.1% of the
phrases. Is retraining using fewer pairs my only option?

-- 
Best regards,
He Shiming
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to