Hi,
you do not have enough space in /tmp, see "No space left on device in /tmp/TuM5Ow". The poison-message is just another echo of that. You can use the -T "path to more space" option to set a path where you have more space. You probably need something around 100-200 GB (16 GB of compressed or uncompressed text? If compressed then probably more.) Best, Marcin W dniu 2015-03-25 14:17, liling tan napisaĆ(a): > Dear Moses dev/users, > > Has anyone tried to build a language model from 16 GB of texts? > > What does "Last input should have been poison." mean? > > Does anyone know how to estimate the output size of the language model file > given 16GB of texts with 8 grams? How about 5grams, how big will it get? > > We've tried to extract 8grams with 16GB of texts and we ended up with: > >> === 1/5 Counting and sorting n-grams === >> >> Reading /home/gillin/wmt15/corpus.truecase/train-lm.en >> >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> >> >> tcmalloc: large alloc 21621391360 bytes == 0x1de6000 @ >> >> tcmalloc: large alloc 86485549056 bytes == 0x50ba5a000 @ >> >> *****************************=== 1/5 Counting and sorting n-grams === >> >> Reading /home/gillin/wmt15/corpus.truecase/train-lm.en >> >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> >> >> tcmalloc: large alloc 14100905984 bytes == 0x2e6c000 @ >> >> tcmalloc: large alloc 94006026240 bytes == 0x34bec4000 @ >> >> **************************************************************************************************** >> >> >> Unigram tokens 3038737446 types 5924314 >> >> === 2/5 Calculating and sorting adjusted counts === >> >> Chain sizes: 1:71091768 2:3162479872 3:5929649664 4:9487439872 5:13835849728 >> 6:18974879744 7:24904527872 8:31624798208 >> >> tcmalloc: large alloc 31624798208 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 3162480640 bytes == 0x2e6c000 @ >> >> tcmalloc: large alloc 5929656320 bytes == 0xbf666000 @ >> >> tcmalloc: large alloc 9487441920 bytes == 0xaa8e86000 @ >> >> tcmalloc: large alloc 13835853824 bytes == 0xcde674000 @ >> >> tcmalloc: large alloc 18974883840 bytes == 0x101715a000 @ >> >> tcmalloc: large alloc 24904531968 bytes == 0x1940db4000 @ >> >> Statistics: >> >> 1 5924314 D1=0.709218 D2=1.04888 D3+=1.33462 >> >> 2 108520273 D1=0.723401 D2=1.06804 D3+=1.36804 >> >> 3 543892823 D1=0.788765 D2=1.11107 D3+=1.35713 >> >> 4 1204990660 D1=0.855434 D2=1.17274 D3+=1.36107 >> >> 5 1716616322 D1=0.907776 D2=1.25272 D3+=1.39455 >> >> 6 1966436508 D1=0.943121 D2=1.34991 D3+=1.45437 >> >> 7 2029467690 D1=0.96405 D2=1.44994 D3+=1.5283 >> >> 8 1997628560 D1=0.863904 D2=1.45784 D3+=1.59832 >> >> Memory estimate for binary LM: >> >> type GB >> >> probing 202 assuming -p 1.5 >> >> probing 245 assuming -r models -p 1.5 >> >> trie 115 without quantization >> >> trie 69 assuming -q 8 -b 8 quantization >> >> trie 96 assuming -a 22 array pointer compression >> >> trie 49 assuming -a 22 -q 8 -b 8 array pointer compression and quantization >> >> === 3/5 Calculating and sorting initial probabilities === >> >> tcmalloc: large alloc 10877861888 bytes == 0x72650000 @ >> >> tcmalloc: large alloc 28919783424 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 48065257472 bytes == 0xa07ad2000 @ >> >> tcmalloc: large alloc 62925971456 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 73060843520 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 79905144832 bytes == 0x34bec4000 @ >> >> Chain sizes: 1:71091768 2:1736324368 3:6017972736 4:9628755968 5:14041935872 >> 6:19257511936 7:25275484160 8:32095852544 >> >> tcmalloc: large alloc 9628762112 bytes == 0x19349e6000 @ >> >> tcmalloc: large alloc 14041939968 bytes == 0x1b7289a000 @ >> >> tcmalloc: large alloc 19257516032 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 25275490304 bytes == 0x7c7c2a000 @ >> >> tcmalloc: large alloc 32095854592 bytes == 0xdaa4c0000 @ >> >> === 4/5 Calculating and writing order-interpolated probabilities === >> >> Chain sizes: 1:71091768 2:1736324368 3:5881222144 4:9409955840 5:13722852352 >> 6:18819911680 7:24701134848 8:31366518784 >> >> tcmalloc: large alloc 9409961984 bytes == 0x19349e6000 @ >> >> tcmalloc: large alloc 13722853376 bytes == 0x1b657f0000 @ >> >> tcmalloc: large alloc 18819915776 bytes == 0x34bec4000 @ >> >> tcmalloc: large alloc 24701140992 bytes == 0x7adad6000 @ >> >> tcmalloc: large alloc 31366520832 bytes == 0xd6dfae000 @ >> >> Last input should have been poison. >> >> util/file.cc:274 in void util::ErsatzPWrite(int, const void*, std::size_t, >> uint64_t) threw FDException'. >> >> No space left on device in /tmp/TuM5Ow (deleted) while writing 13586550656 >> bytes at offset 49146486784 > > Regards, > Liling > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support [1] Links: ------ [1] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
