Hi,
Yes, a smaller phrase table should help. I wrote the table, but that was in 
2012 and I cannot really remember what goes on in there. I think making sure 
that you do not have too many target phrases per source phrase should help. 

From: He Shiming
Sent: Thursday, May 9, 2019 8:49 PM
To: [email protected]
Subject: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 
268435455

Hi,

I'm training a Chinese-to-English phrase-based model, using 33 million sentence 
pairs. My phrase table is 90GB gzipped, and the reordering table is 27GB 
gzipped. When running processPhraseTableMin, it dies in step 3 because of the 
following error:

Intermezzo: Calculating Huffman code sets
        Creating Huffman codes for 1786817 target phrase symbols
        Creating Huffman codes for 871265 scores
        Creating Huffman codes for 18018117 scores
        Creating Huffman codes for 827039 scores
        Creating Huffman codes for 17861459 scores
        Creating Huffman codes for 50 alignment points

Pass 3/3: Compressing target phrases
..................................................[5000000]
..................................................[345000000]
............................................terminate called after throwing an 
instance of 'util::Exception'
  what():  moses/TranslationModel/CompactPT/ListCoders.h:179 in static void 
Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt = 
unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception 
because `*it > 268435455'.
You are trying to encode 436766721 with Simple9. Cannot encode numbers larger 
than 268435455 (2^28-1)
Aborted (core dumped)

Is my phrase table too big? Pruning seems to have only removed 0.1% of the 
phrases. Is retraining using fewer pairs my only option?

-- 
Best regards,
He Shiming

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to