Works for me.  I've received your ARPA file and successfully built a
trie model from it using the same -m 4096 setting.  Vanilla moses trunk
3878 (though 3878 wasn't related to kenlm and you reported it still
failed with 3877).

While you specified -m 4096, the code determined it needed only 3436 MB
for sorting based on the number of n-grams of each other.  This sorting
memory is then freed at the end of the first progress bar.  In "Building
trie", a 4.9 GB model is mapped in memory.  It's simultaneously writing
at four addresses, one for each length 2 to 5 (the unigrams are handled
specially).  The 67% point, where your error occurs, is where the 5-gram
pointer exceeds 4 GB from the start of the memory mapped file.

I think your processor/OS/ulimit settings are not allowing the 4 GB
offset, especially since SIGBUS means the CPU says the address cannot be
interpreted (as opposed to segfault where there's nothing mapped an
otherwise valid address).  If I changed the trie builder to write pieces
less than 4 GB, memory mapping would still be broken at runtime.  So
there's not much I can do other than suggest you use a different machine
or smaller models, sorry.

Kenneth

On 02/10/11 20:54, Kenneth Heafield wrote:
> Please update to revision 3877 or above.  I've checked in fix that's
> probably it.
> 
> Sorry,
> 
> Kenneth
> 
> On 02/10/11 01:07, Kenneth Heafield wrote:
>> What architecture are you on?  64-bit x86?   I'm assuming you compiled
>> 64-bit.
>>
>> Could you send me either the ARPA or a tarball of the temporary building
>> directory snapshotted by hitting ctrl+c in while the second progress bar
>> is running?  I've sent you off-list instructions on how transfer a file
>> to me.
>>
>> On 02/09/11 04:14, Kārlis Goba wrote:
>>> I'm encountering a bus error while building a trie KenLM:
>>>
>>> $ build_binary -m 4096 trie lm-pruned.arpa lm-pruned.mmap
>>> Reading lm-pruned.arpa
>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>> ****************************************************************************************************
>>> Counting n-grams that should not have been pruned
>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>> ****************************************************************************************************
>>> Building trie
>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>> *******************************************************************Bus error
>>>
>>> The ARPA LM file is 13G in size with the following n-gram counts:
>>>
>>> ngram 1= 281277
>>> ngram 2= 24981586
>>> ngram 3= 116033104
>>> ngram 4= 146285904
>>> ngram 5= 94016017
>>>
>>> I'm using Moses revision 3873. The ARPA LM was estimated and pruned with 
>>> IRSTLM 5.50.02. Similarly built LMs with similar size (15G) were converted 
>>> to KenLM without problems. 
>>>
>>> Regards,
>>> Karlis
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to