Hi, 

you do not have enough space in /tmp, see "No space left on device in
/tmp/TuM5Ow". The poison-message is just another echo of that. You can
use the -T "path to more space" option to set a path where you have more
space. You probably need something around 100-200 GB (16 GB of
compressed or uncompressed text? If compressed then probably more.) 

Best, 

Marcin 

W dniu 2015-03-25 14:17, liling tan napisaƂ(a): 

> Dear Moses dev/users, 
> 
> Has anyone tried to build a language model from 16 GB of texts? 
> 
> What does "Last input should have been poison." mean? 
> 
> Does anyone know how to estimate the output size of the language model file 
> given 16GB of texts with 8 grams? How about 5grams, how big will it get? 
> 
> We've tried to extract 8grams with 16GB of texts and we ended up with: 
> 
>> === 1/5 Counting and sorting n-grams === 
>> 
>> Reading /home/gillin/wmt15/corpus.truecase/train-lm.en 
>> 
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>  
>> 
>> tcmalloc: large alloc 21621391360 bytes == 0x1de6000 @ 
>> 
>> tcmalloc: large alloc 86485549056 bytes == 0x50ba5a000 @ 
>> 
>> *****************************=== 1/5 Counting and sorting n-grams === 
>> 
>> Reading /home/gillin/wmt15/corpus.truecase/train-lm.en 
>> 
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>  
>> 
>> tcmalloc: large alloc 14100905984 bytes == 0x2e6c000 @ 
>> 
>> tcmalloc: large alloc 94006026240 bytes == 0x34bec4000 @ 
>> 
>> ****************************************************************************************************
>>  
>> 
>> Unigram tokens 3038737446 types 5924314 
>> 
>> === 2/5 Calculating and sorting adjusted counts === 
>> 
>> Chain sizes: 1:71091768 2:3162479872 3:5929649664 4:9487439872 5:13835849728 
>> 6:18974879744 7:24904527872 8:31624798208 
>> 
>> tcmalloc: large alloc 31624798208 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 3162480640 bytes == 0x2e6c000 @ 
>> 
>> tcmalloc: large alloc 5929656320 bytes == 0xbf666000 @ 
>> 
>> tcmalloc: large alloc 9487441920 bytes == 0xaa8e86000 @ 
>> 
>> tcmalloc: large alloc 13835853824 bytes == 0xcde674000 @ 
>> 
>> tcmalloc: large alloc 18974883840 bytes == 0x101715a000 @ 
>> 
>> tcmalloc: large alloc 24904531968 bytes == 0x1940db4000 @ 
>> 
>> Statistics: 
>> 
>> 1 5924314 D1=0.709218 D2=1.04888 D3+=1.33462 
>> 
>> 2 108520273 D1=0.723401 D2=1.06804 D3+=1.36804 
>> 
>> 3 543892823 D1=0.788765 D2=1.11107 D3+=1.35713 
>> 
>> 4 1204990660 D1=0.855434 D2=1.17274 D3+=1.36107 
>> 
>> 5 1716616322 D1=0.907776 D2=1.25272 D3+=1.39455 
>> 
>> 6 1966436508 D1=0.943121 D2=1.34991 D3+=1.45437 
>> 
>> 7 2029467690 D1=0.96405 D2=1.44994 D3+=1.5283 
>> 
>> 8 1997628560 D1=0.863904 D2=1.45784 D3+=1.59832 
>> 
>> Memory estimate for binary LM: 
>> 
>> type GB 
>> 
>> probing 202 assuming -p 1.5 
>> 
>> probing 245 assuming -r models -p 1.5 
>> 
>> trie 115 without quantization 
>> 
>> trie 69 assuming -q 8 -b 8 quantization 
>> 
>> trie 96 assuming -a 22 array pointer compression 
>> 
>> trie 49 assuming -a 22 -q 8 -b 8 array pointer compression and quantization 
>> 
>> === 3/5 Calculating and sorting initial probabilities === 
>> 
>> tcmalloc: large alloc 10877861888 bytes == 0x72650000 @ 
>> 
>> tcmalloc: large alloc 28919783424 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 48065257472 bytes == 0xa07ad2000 @ 
>> 
>> tcmalloc: large alloc 62925971456 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 73060843520 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 79905144832 bytes == 0x34bec4000 @ 
>> 
>> Chain sizes: 1:71091768 2:1736324368 3:6017972736 4:9628755968 5:14041935872 
>> 6:19257511936 7:25275484160 8:32095852544 
>> 
>> tcmalloc: large alloc 9628762112 bytes == 0x19349e6000 @ 
>> 
>> tcmalloc: large alloc 14041939968 bytes == 0x1b7289a000 @ 
>> 
>> tcmalloc: large alloc 19257516032 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 25275490304 bytes == 0x7c7c2a000 @ 
>> 
>> tcmalloc: large alloc 32095854592 bytes == 0xdaa4c0000 @ 
>> 
>> === 4/5 Calculating and writing order-interpolated probabilities === 
>> 
>> Chain sizes: 1:71091768 2:1736324368 3:5881222144 4:9409955840 5:13722852352 
>> 6:18819911680 7:24701134848 8:31366518784 
>> 
>> tcmalloc: large alloc 9409961984 bytes == 0x19349e6000 @ 
>> 
>> tcmalloc: large alloc 13722853376 bytes == 0x1b657f0000 @ 
>> 
>> tcmalloc: large alloc 18819915776 bytes == 0x34bec4000 @ 
>> 
>> tcmalloc: large alloc 24701140992 bytes == 0x7adad6000 @ 
>> 
>> tcmalloc: large alloc 31366520832 bytes == 0xd6dfae000 @ 
>> 
>> Last input should have been poison. 
>> 
>> util/file.cc:274 in void util::ErsatzPWrite(int, const void*, std::size_t, 
>> uint64_t) threw FDException'. 
>> 
>> No space left on device in /tmp/TuM5Ow (deleted) while writing 13586550656 
>> bytes at offset 49146486784
> 
> Regards, 
> Liling 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]

 

Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to