I've had processPhraseTableMin crash when the phrase table contains duplicate entries (can't remember if there was an unreasonable memory allocation involved). Is Marcin using the exact same phrase table? Can you check if the phrase table has duplicate entries?
To crash or not to crash could also depend on OS and libraries used. You can get the versions of libraries compiled into moses with moses --version I've had duplicate entries in the phrase table after running ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s significance filtering that I pulled in from his WIPO branch; compile with --with-mm --with-mm-extras to get it compiled. - Uli On Wed, Feb 3, 2016 at 12:01 PM, Marcin Junczys-Dowmunt <[email protected]> wrote: > Weird. > > Jeremy, I binarized your phrase-table a couple of times with different > commits (also the most recent one), and I cannot reproduce the error. > Try maybe -threads 10 or 12. > I can make the binarized versions available for download. > > W dniu 02.02.2016 o 18:21, Marcin Junczys-Dowmunt pisze: > > Looks fine, I had no problems running it with 18 and more domain > > indicators. Your machine is certainly more than suitable. Just one > > remark, using more than 8-12 threads usually slows things down, but > > should not cause crashes. Any chance to have a look at that table? > > > > W dniu 02.02.2016 o 18:16, Jeremy Gwinnup pisze: > >> Marcin, > >> > >> I was able to use -T with processLexicalTableMin successfully. I also > tried processPhraseTableMin using a local tmp dir with 200G free and it > still crashed at step 3 with the huge malloc message. Phrase table is > nothing fancy - just standard 4 scores and 3 domain indicator features. > Here’s a complete output with more info about the phrase table: > >> > >> Phrase table in question: > >> > >> -rw-rw-r-- 1 jgwinnup scream 2.2G Feb 1 23:58 phrase-table.1.gz > >> > >> Machine in question has 1TB RAM/32 cores - should be more than enough > for the jobe > >> > >> Moses git-rev ends with: 80572b4 (Jan. 27) > >> > >> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz > -out phrase-table.1 -threads all -nscores 7 -T /tmp_with_200G_free > >> WARNING: You are using a nonstandard number of scores (7) with PREnc. > Set the index of P(t|s) with -rankscore int if it is not 2. > >> Used options: > >> Text phrase table will be read from: phrase-table.1.gz > >> Output phrase table will be written to: phrase-table.1.minphr > >> Step size for source landmark phrases: 2^10=1024 > >> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05 > >> Selected target phrase encoding: Huffman + PREnc > >> Maxiumum allowed rank for PREnc: 100 > >> Number of score components in phrase table: 7 > >> Single Huffman code set for score components: no > >> Using score quantization: no > >> Explicitly included alignment information: yes > >> Running with 32 threads > >> > >> Pass 1/3: Creating hash function for rank assignment > >> ..................................................[5000000] > >> ..................................................[10000000] > >> ..................................................[15000000] > >> ..................................................[20000000] > >> ..................................................[25000000] > >> ..................................................[30000000] > >> ..................................................[35000000] > >> ..................................................[40000000] > >> ..................................................[45000000] > >> .... > >> > >> Pass 2/3: Creating source phrase index + Encoding target phrases > >> ..................................................[5000000] > >> ..................................................[10000000] > >> ..................................................[15000000] > >> ..................................................[20000000] > >> ..................................................[25000000] > >> ..................................................[30000000] > >> ..................................................[35000000] > >> ..................................................[40000000] > >> ..................................................[45000000] > >> .... > >> > >> Intermezzo: Calculating Huffman code sets > >> Creating Huffman codes for 471366 target phrase symbols > >> tcmalloc: large alloc 13808820224 bytes == 0xb0592000 @ > >> tcmalloc: large alloc 27617640448 bytes == 0x3e86b0000 @ > >> tcmalloc: large alloc 5187358422106112 bytes == (nil) @ > >> terminate called after throwing an instance of 'std::bad_alloc' > >> what(): std::bad_alloc > >> > >> > >> > >> > >>> On Feb 2, 2016, at 10:21 AM, Jeremy Gwinnup <[email protected]> > wrote: > >>> > >>> Hi, > >>> > >>> I’m having a problem using processPhraseTableMin to compress a phrase > table with 7 scores - the program consistently coredumps at step 3 - > command and relevant output below. Is there anything I’m doing glaringly > wrong? > >>> > >>> Thanks! > >>> -Jeremy > >>> > >>> Command: > >>> > >>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz > -out phrase-table.1 -threads all -nscores 7 > >>> > >>> Once we get to step 3: > >>> > >>> Intermezzo: Calculating Huffman code sets > >>> Creating Huffman codes for 471366 target phrase symbols > >>> tcmalloc: large alloc 13983629312 bytes == 0xb14ce000 @ > >>> tcmalloc: large alloc 27967250432 bytes == 0x3f3ca4000 @ > >>> tcmalloc: large alloc 15681406635450368 bytes == (nil) @ > >>> terminate called after throwing an instance of 'std::bad_alloc' > >>> what(): std::bad_alloc > >>> > >>> Top looked like this when the program ran into trouble: > >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>> 27416 jgwinnup 20 0 45.9g 30g 4.0g R 10.6 3.0 1589:17 > processPhraseTa > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Ulrich Germann Senior Researcher School of Informatics University of Edinburgh
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
