Hi Uli, By now we think this particular error might be caused by Jeremy using the intel compiler instead of g++.
Duplicate entries will cause crashes due to the minimal perfect hash function used, it should die with an error message about collisions then. The duplicate entries coming from the sigtest-filter is another matter I should look into, that should not be happening either. W dniu 2016-02-04 16:03, Ulrich Germann napisaĆ(a): > I've had processPhraseTableMin crash when the phrase table contains duplicate > entries (can't remember if there was an unreasonable memory allocation > involved). Is Marcin using the exact same phrase table? Can you check if the > phrase table has duplicate entries? > > To crash or not to crash could also depend on OS and libraries used. You can > get the versions of libraries compiled into moses with > > moses --version > > I've had duplicate entries in the phrase table after running > ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s > significance filtering that I pulled in from his WIPO branch; compile with > --with-mm --with-mm-extras to get it compiled. > > - Uli > > On Wed, Feb 3, 2016 at 12:01 PM, Marcin Junczys-Dowmunt <[email protected]> > wrote: > >> Weird. >> >> Jeremy, I binarized your phrase-table a couple of times with different >> commits (also the most recent one), and I cannot reproduce the error. >> Try maybe -threads 10 or 12. >> I can make the binarized versions available for download. >> >> W dniu 02.02.2016 [1] o 18:21, Marcin Junczys-Dowmunt pisze: >>> Looks fine, I had no problems running it with 18 and more domain >>> indicators. Your machine is certainly more than suitable. Just one >>> remark, using more than 8-12 threads usually slows things down, but >>> should not cause crashes. Any chance to have a look at that table? >>> >>> W dniu 02.02.2016 [1] o 18:16, Jeremy Gwinnup pisze: >>>> Marcin, >>>> >>>> I was able to use -T with processLexicalTableMin successfully. I also >>>> tried processPhraseTableMin using a local tmp dir with 200G free and it >>>> still crashed at step 3 with the huge malloc message. Phrase table is >>>> nothing fancy - just standard 4 scores and 3 domain indicator features. >>>> Here's a complete output with more info about the phrase table: >>>> >>>> Phrase table in question: >>>> >>>> -rw-rw-r-- 1 jgwinnup scream 2.2G Feb 1 23:58 phrase-table.1.gz >>>> >>>> Machine in question has 1TB RAM/32 cores - should be more than enough for >>>> the jobe >> >>>> >>>> Moses git-rev ends with: 80572b4 (Jan. 27) >>>> >>>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz -out >>>> phrase-table.1 -threads all -nscores 7 -T /tmp_with_200G_free >>>> WARNING: You are using a nonstandard number of scores (7) with PREnc. Set >>>> the index of P(t|s) with -rankscore int if it is not 2. >>>> Used options: >>>> Text phrase table will be read from: phrase-table.1.gz >>>> Output phrase table will be written to: phrase-table.1.minphr >>>> Step size for source landmark phrases: 2^10=1024 >>>> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05 >>>> Selected target phrase encoding: Huffman + PREnc >>>> Maxiumum allowed rank for PREnc: 100 >>>> Number of score components in phrase table: 7 >>>> Single Huffman code set for score components: no >>>> Using score quantization: no >>>> Explicitly included alignment information: yes >>>> Running with 32 threads >>>> >>>> Pass 1/3: Creating hash function for rank assignment >>>> ..................................................[5000000] >>>> ..................................................[10000000] >>>> ..................................................[15000000] >>>> ..................................................[20000000] >>>> ..................................................[25000000] >>>> ..................................................[30000000] >>>> ..................................................[35000000] >>>> ..................................................[40000000] >>>> ..................................................[45000000] >>>> .... >>>> >>>> Pass 2/3: Creating source phrase index + Encoding target phrases >>>> ..................................................[5000000] >>>> ..................................................[10000000] >>>> ..................................................[15000000] >>>> ..................................................[20000000] >>>> ..................................................[25000000] >>>> ..................................................[30000000] >>>> ..................................................[35000000] >>>> ..................................................[40000000] >>>> ..................................................[45000000] >>>> .... >>>> >>>> Intermezzo: Calculating Huffman code sets >>>> Creating Huffman codes for 471366 target phrase symbols >>>> tcmalloc: large alloc 13808820224 bytes == 0xb0592000 @ >>>> tcmalloc: large alloc 27617640448 bytes == 0x3e86b0000 @ >>>> tcmalloc: large alloc 5187358422106112 bytes == (nil) @ >>>> terminate called after throwing an instance of 'std::bad_alloc' >>>> what(): std::bad_alloc >>>> >>>> >>>> >>>> >>>>> On Feb 2, 2016, at 10:21 AM, Jeremy Gwinnup <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I'm having a problem using processPhraseTableMin to compress a phrase >>>>> table with 7 scores - the program consistently coredumps at step 3 - >>>>> command and relevant output below. Is there anything I'm doing glaringly >>>>> wrong? >>>>> >>>>> Thanks! >>>>> -Jeremy >>>>> >>>>> Command: >>>>> >>>>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz >>>>> -out phrase-table.1 -threads all -nscores 7 >>>>> >>>>> Once we get to step 3: >>>>> >>>>> Intermezzo: Calculating Huffman code sets >>>>> Creating Huffman codes for 471366 target phrase symbols >>>>> tcmalloc: large alloc 13983629312 bytes == 0xb14ce000 @ >>>>> tcmalloc: large alloc 27967250432 bytes == 0x3f3ca4000 @ >>>>> tcmalloc: large alloc 15681406635450368 bytes == (nil) @ >>>>> terminate called after throwing an instance of 'std::bad_alloc' >>>>> what(): std::bad_alloc >>>>> >>>>> Top looked like this when the program ran into trouble: >>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>> 27416 jgwinnup 20 0 45.9g 30g 4.0g R 10.6 3.0 1589:17 processPhraseTa >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support [2] >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support [2] >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support [2] > > -- > > Ulrich Germann > Senior Researcher > School of Informatics > > University of Edinburgh Links: ------ [1] tel:02.02.2016 [2] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
