Hi,
Number 8 means prefault and number 9 means lazy mmap. It's an
option and orthogonal to the data structure.
Since the binary file is the in-memory representation, I do paranoid
checks to make sure your machine represents floats, 64-bit integers, and
such in the same way. For example a 32-bit build will have different
alignment than a 64-bit build. This check is complaining.
Please try build_binary and moses from the same build. If that
doesn't work, please send me the first kilobyte of your binary file.
Also, if you have Boost, can you cd kenlm && make clean && ./test.sh
and complain if there are any test failures?
Kenneth
On 10/10/11 17:31, marco turchi wrote:
> Hi Kenneth,
> which number shall I use in the moses.ini 8 or 9 if i build my lm with
> these parameters -q 8 -b 8?
>
> I got this error when I run moses:
> In LanguageModelKen::Load: nGramOrder = 5 will be ignored. Using
> whatever the file has.
> terminate called after throwing an instance of 'lm::FormatLoadException'
> what(): File looks like it should be loaded with mmap, but the test
> values don't match. Was it built on a different machine or with a
> different compiler?
>
> I have the feeling that my moses version needs to be updated!
>
> Thanks a lot
> Marco
>
> On Sat, Oct 8, 2011 at 1:02 PM, marco turchi <[email protected]
> <mailto:[email protected]>> wrote:
>
> Thanks!
> I'm going to update my version.
>
> Cheers
> Marco
>
>
> On Sat, Oct 8, 2011 at 1:01 PM, Kenneth Heafield
> <[email protected] <mailto:[email protected]>> wrote:
>
> Fixed in revision 4314. There's still an issue with some
> SRILM models failing to build that I'll get to soon.
>
> On 10/08/11 11:52, marco turchi wrote:
>> Hi,
>> thanks a lot for the answer.
>> Great, so I can use -m 2048 to build it. Do you think it is
>> enough?
>>
>> Thanks again
>> Marco
>>
>> On Sat, Oct 8, 2011 at 12:46 PM, Kenneth Heafield
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> Hi,
>>
>> This looks like a bug in the trie implementation due
>> to some recent changes I made for left state
>> minimization. I'll fix it soon. A workaround is to pass
>> a large -m option to build_binary.
>>
>> Sorry,
>>
>> Kenneth
>>
>>
>> On 10/08/11 11:34, marco turchi wrote:
>>> Dear All,
>>> I'm trying to build a lm using a large dataset (> 11 M
>>> sentences). I have generated the Arpa format with irstlm
>>> and now I'd like to binarize it using kenlm.
>>>
>>> I have called the build_binary to estimate memory usage,
>>> and I got this
>>>
>>> Memory estimate:
>>> type MB
>>> probing 16129 assuming -p 1.5
>>> trie 7462 without quantization
>>> trie 4361 assuming -q 8 -b 8 quantization
>>> trie 6440 assuming -a 22 array pointer compression
>>> trie 3339 assuming -a 22 -q 8 -b 8 array pointer
>>> compression and quantization
>>>
>>> then I run the binarization in this way:
>>>
>>> /nfs/staging/turchmo/moses/kenlmNew/build_binary -i -t
>>> /tmp/ -q 8 -b 8 trie irstLM.ARPA.txt
>>> irstLanguageModel.binary.lm
>>>
>>> but I got this error:
>>>
>>> lm/search_trie.cc:409 in void
>>> lm::ngram::trie::<unnamed>::SanityCheckCounts(const
>>> std::vector<long unsigned int, std::allocator<long
>>> unsigned int> >&, const std::vector<long unsigned int,
>>> std::allocator<long unsigned int> >&) threw
>>> util::Exception'.
>>> Longest count should be constant but it changed from
>>> 289546423 to 289546405 Byte: 37297517525
>>>
>>> I have had a look into the mailing list, but I do not
>>> find any post with the same error.
>>>
>>> Any ideas?
>>>
>>> Thanks a lot
>>> Marco
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support