Works for me. I've received your ARPA file and successfully built a trie model from it using the same -m 4096 setting. Vanilla moses trunk 3878 (though 3878 wasn't related to kenlm and you reported it still failed with 3877).
While you specified -m 4096, the code determined it needed only 3436 MB for sorting based on the number of n-grams of each other. This sorting memory is then freed at the end of the first progress bar. In "Building trie", a 4.9 GB model is mapped in memory. It's simultaneously writing at four addresses, one for each length 2 to 5 (the unigrams are handled specially). The 67% point, where your error occurs, is where the 5-gram pointer exceeds 4 GB from the start of the memory mapped file. I think your processor/OS/ulimit settings are not allowing the 4 GB offset, especially since SIGBUS means the CPU says the address cannot be interpreted (as opposed to segfault where there's nothing mapped an otherwise valid address). If I changed the trie builder to write pieces less than 4 GB, memory mapping would still be broken at runtime. So there's not much I can do other than suggest you use a different machine or smaller models, sorry. Kenneth On 02/10/11 20:54, Kenneth Heafield wrote: > Please update to revision 3877 or above. I've checked in fix that's > probably it. > > Sorry, > > Kenneth > > On 02/10/11 01:07, Kenneth Heafield wrote: >> What architecture are you on? 64-bit x86? I'm assuming you compiled >> 64-bit. >> >> Could you send me either the ARPA or a tarball of the temporary building >> directory snapshotted by hitting ctrl+c in while the second progress bar >> is running? I've sent you off-list instructions on how transfer a file >> to me. >> >> On 02/09/11 04:14, Kārlis Goba wrote: >>> I'm encountering a bus error while building a trie KenLM: >>> >>> $ build_binary -m 4096 trie lm-pruned.arpa lm-pruned.mmap >>> Reading lm-pruned.arpa >>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>> **************************************************************************************************** >>> Counting n-grams that should not have been pruned >>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>> **************************************************************************************************** >>> Building trie >>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>> *******************************************************************Bus error >>> >>> The ARPA LM file is 13G in size with the following n-gram counts: >>> >>> ngram 1= 281277 >>> ngram 2= 24981586 >>> ngram 3= 116033104 >>> ngram 4= 146285904 >>> ngram 5= 94016017 >>> >>> I'm using Moses revision 3873. The ARPA LM was estimated and pruned with >>> IRSTLM 5.50.02. Similarly built LMs with similar size (15G) were converted >>> to KenLM without problems. >>> >>> Regards, >>> Karlis >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
