Re: [Moses-support] Problem building KenLm

marco turchi Mon, 10 Oct 2011 10:23:59 -0700

I see, so no chances of translating without rebuild everything :-(

Thanks a lot
Marco


On Mon, Oct 10, 2011 at 7:05 PM, Kenneth Heafield <[email protected]>wrote:

> **
> Ah.  I've changed the version number to force people to rebuild their
> binary files when the format changes.  Old versions produced that error
> message when the version number changed.  Quantization wasn't even
> implemented in kenlm until some time in 2011, so you're not going to get a
> Moses from 2010 to work with it.
>
> As to the compiler errors on the tests, you're probably using an old
> version of Boost not supported by the tests.
>
> Kenneth
>
>
> On 10/10/11 17:57, marco turchi wrote:
>
> Hi
> what I have done is to download the last version of kenlm, built it and run
> the build_binary. Then I call a 2010 version of moses using the lm.
>
> I run the ./test.sh and I got these messages:
> util/bit_packing_test.cc:14: error: expected constructor, destructor, or
> type conversion before â(â token
> util/bit_packing_test.cc:21: error: expected constructor, destructor, or
> type conversion before â(â token
> util/bit_packing_test.cc:59: error: expected `}' at end of input
> util/bit_packing_test.cc:59: error: expected `}' at end of input
>
> I guess, I need to reinstall the full package moses+kenlm and biuld
> everything together.
>
> Thanks a lot
> Marco
>
> On Mon, Oct 10, 2011 at 6:46 PM, Kenneth Heafield <[email protected]>wrote:
>
>>  Hi,
>>
>>     Number 8 means prefault and number 9 means lazy mmap.  It's an option
>> and orthogonal to the data structure.
>>
>>     Since the binary file is the in-memory representation, I do paranoid
>> checks to make sure your machine represents floats, 64-bit integers, and
>> such in the same way.  For example a 32-bit build will have different
>> alignment than a 64-bit build.  This check is complaining.
>>
>>     Please try build_binary and moses from the same build.  If that
>> doesn't work, please send me the first kilobyte of your binary file.
>>
>>     Also, if you have Boost, can you cd kenlm && make clean && ./test.sh
>> and complain if there are any test failures?
>>
>> Kenneth
>>
>>
>> On 10/10/11 17:31, marco turchi wrote:
>>
>> Hi Kenneth,
>> which number shall I use in the moses.ini 8 or 9 if i build my lm with
>> these parameters -q 8 -b 8?
>>
>> I got this error when I run moses:
>> In LanguageModelKen::Load: nGramOrder = 5 will be ignored.  Using whatever
>> the file has.
>> terminate called after throwing an instance of 'lm::FormatLoadException'
>>   what():  File looks like it should be loaded with mmap, but the test
>> values don't match.  Was it built on a different machine or with a different
>> compiler?
>>
>> I have the feeling that my moses version needs to be updated!
>>
>> Thanks a lot
>> Marco
>>
>> On Sat, Oct 8, 2011 at 1:02 PM, marco turchi <[email protected]>wrote:
>>
>>> Thanks!
>>> I'm going to update my version.
>>>
>>> Cheers
>>> Marco
>>>
>>>
>>> On Sat, Oct 8, 2011 at 1:01 PM, Kenneth Heafield <[email protected]>wrote:
>>>
>>>>  Fixed in revision 4314.  There's still an issue with some SRILM models
>>>> failing to build that I'll get to soon.
>>>>
>>>> On 10/08/11 11:52, marco turchi wrote:
>>>>
>>>> Hi,
>>>> thanks a lot for the answer.
>>>> Great, so I can use -m 2048 to build it. Do you think it is enough?
>>>>
>>>> Thanks again
>>>> Marco
>>>>
>>>> On Sat, Oct 8, 2011 at 12:46 PM, Kenneth Heafield 
>>>> <[email protected]>wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>>     This looks like a bug in the trie implementation due to some recent
>>>>> changes I made for left state minimization.  I'll fix it soon.  A 
>>>>> workaround
>>>>> is to pass a large -m option to build_binary.
>>>>>
>>>>> Sorry,
>>>>>
>>>>> Kenneth
>>>>>
>>>>>
>>>>> On 10/08/11 11:34, marco turchi wrote:
>>>>>
>>>>>  Dear All,
>>>>> I'm trying to build a lm using a large dataset (> 11 M sentences). I
>>>>> have generated the Arpa format with irstlm and now I'd like to binarize it
>>>>> using kenlm.
>>>>>
>>>>> I have called the build_binary to estimate memory usage, and I got this
>>>>>
>>>>> Memory estimate:
>>>>> type       MB
>>>>> probing 16129 assuming -p 1.5
>>>>> trie     7462 without quantization
>>>>> trie     4361 assuming -q 8 -b 8 quantization
>>>>> trie     6440 assuming -a 22 array pointer compression
>>>>> trie     3339 assuming -a 22 -q 8 -b 8 array pointer compression and
>>>>> quantization
>>>>>
>>>>> then I run the binarization in this way:
>>>>>
>>>>> /nfs/staging/turchmo/moses/kenlmNew/build_binary -i -t /tmp/ -q 8 -b 8
>>>>> trie irstLM.ARPA.txt irstLanguageModel.binary.lm
>>>>>
>>>>> but I got this error:
>>>>>
>>>>> lm/search_trie.cc:409 in void
>>>>> lm::ngram::trie::<unnamed>::SanityCheckCounts(const std::vector<long
>>>>> unsigned int, std::allocator<long unsigned int> >&, const std::vector<long
>>>>> unsigned int, std::allocator<long unsigned int> >&) threw 
>>>>> util::Exception'.
>>>>> Longest count should be constant but it changed from 289546423 to
>>>>> 289546405 Byte: 37297517525
>>>>>
>>>>> I have had a look into the mailing list, but I do not find any post
>>>>> with the same error.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks a lot
>>>>> Marco
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing 
>>>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem building KenLm

Reply via email to