Hi, Yes, a smaller phrase table should help. I wrote the table, but that was in 2012 and I cannot really remember what goes on in there. I think making sure that you do not have too many target phrases per source phrase should help.
From: He Shiming Sent: Thursday, May 9, 2019 8:49 PM To: [email protected] Subject: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 268435455 Hi, I'm training a Chinese-to-English phrase-based model, using 33 million sentence pairs. My phrase table is 90GB gzipped, and the reordering table is 27GB gzipped. When running processPhraseTableMin, it dies in step 3 because of the following error: Intermezzo: Calculating Huffman code sets Creating Huffman codes for 1786817 target phrase symbols Creating Huffman codes for 871265 scores Creating Huffman codes for 18018117 scores Creating Huffman codes for 827039 scores Creating Huffman codes for 17861459 scores Creating Huffman codes for 50 alignment points Pass 3/3: Compressing target phrases ..................................................[5000000] ..................................................[345000000] ............................................terminate called after throwing an instance of 'util::Exception' what(): moses/TranslationModel/CompactPT/ListCoders.h:179 in static void Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt = unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception because `*it > 268435455'. You are trying to encode 436766721 with Simple9. Cannot encode numbers larger than 268435455 (2^28-1) Aborted (core dumped) Is my phrase table too big? Pruning seems to have only removed 0.1% of the phrases. Is retraining using fewer pairs my only option? -- Best regards, He Shiming
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
