Hi Marcin,

I ran it several times and the backtrace looked identical in all cases:

#0  0x000000000081fd18 in Moses::BitWrapper<std::string>::Seek
(this=0x7ffc23a5c7f0,
    bitPos=18446744073708512633) at
moses/TranslationModel/CompactPT/CanonicalHuffman.h:303
#1  0x000000000081dd78 in Moses::BitWrapper<std::string>::SeekFromEnd
(this=0x7ffc23a5c7f0,
    bitPosFromEnd=1039935) at
moses/TranslationModel/CompactPT/CanonicalHuffman.h:309
#2  0x000000000081a4ca in
Moses::PhraseDecoder::CreateTargetPhraseCollection (
    this=0xbac1650, sourcePhrase=..., topLevel=true, eval=true)
    at moses/TranslationModel/CompactPT/PhraseDecoder.cpp:233

But the crash itself happens completely randomly...

I don't think you can access our machines anymore, sorry :( The account was
already deleted.

Best,
Ales

On Wed, Apr 13, 2016 at 10:10 PM, Marcin Junczys-Dowmunt <[email protected]
> wrote:

> Urghs, not good. Can I somehow get access to that machine? Is it
> deterministic?
>
> W dniu 13.04.2016 o 21:06, Barry Haddow pisze:
>
> Hi Ales
>
> Well, bitPos=18446744073708512633  looks bogus. Marcin?
>
> cheers - Barry
>
> On 13/04/16 17:23, Aleš Tamchyna wrote:
>
> Hi all,
>
> sorry for the delay. I'm attaching the debug backtrace.
>
> Best,
> Ales
>
> On Wed, Apr 13, 2016 at 1:49 PM, Barry Haddow <[email protected]>
> wrote:
>
>> Hi
>>
>> The backtrace would be more informative if you run with a debug build
>> (add variant=debug to bjam). Sometimes this makes bugs go away, or new bugs
>> appear, but if not then it will give more information. You can run with
>> core files enabled (ulimit -c unlimited) to save having to run Moses inside
>> gdb.
>>
>> If the bug is random, but not thread related, then it could well be
>> memory corruption. Running Moses in valgrind can help track this down
>> (again, using a debug build is better). Note that the suffix arrays crash
>> valgrind (last time I checked) so don't build them in,
>>
>> cheers - Barry
>>
>>
>> On 13/04/16 11:25, Ales Tamchyna wrote:
>>
>> Hi,
>> Let me add some more information to this: when running Moses in gdb, I get 
>> the following backtrace:
>> #0  0x00000000006e3ba4 in 
>> Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, 
>> bool, bool) ()
>> #1  0x00000000005cd2a7 in 
>> Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase
>>  const&) const ()
>> #2  0x000000000048efe4 in 
>> Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase 
>> const&) const ()
>> #3  0x000000000048e6a0 in 
>> Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector<Moses::InputPath*,
>>  std::allocator<Moses::InputPath*> > const&) const ()
>> #4  0x0000000000560948 in 
>> Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() ()
>> #5  0x0000000000551a39 in 
>> Moses::TranslationOptionCollectionText::CreateTranslationOptions() ()
>> #6  0x00000000004bddfc in Moses::Manager::Decode() ()
>> #7  0x0000000000433bd4 in Moses::TranslationTask::Run() ()
>> #8  0x0000000000496088 in Moses::ThreadPool::Execute() ()
>> #9  0x00000000007cbdba in thread_proxy ()
>> #10 0x00007fffc210c182 in start_thread (arg=0x7ffc23a5d700) at 
>> pthread_create.c:312
>> #11 0x00007fffc1e3947d in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>> This suggests the problem is somewhere in loading phrase translations from 
>> the compact phrase table.
>> I’m not sure why the LEGACY functions are called but I’m assuming that these 
>> are “future” legacy methods and that they are in fact still used by phrase 
>> dictionary implementations (?).
>> Best,
>> Ales
>> From: Ondrej Bojar
>> Sent: středa 13. dubna 2016 12:19
>> To: [email protected]
>> Cc: Roman Sudarikov; Ales Tamchyna
>> Subject: Random segfaults with alternative decoding paths
>>
>>
>> Hi,
>>
>> we're experiencing random segfaults when we use two phrase tables in 
>> alternative decoding paths. The exact commit of moses we use is 
>> 6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1.
>>
>> We do have test runs on the exact same 200 input sentences, exact same 
>> moses.ini, on the very same machine, where one of the runs succeeds and the 
>> other dies after 45 sentences.
>>
>>
>> Would anyone have any idea what should we be chasing?
>>
>> - it doesn't seem to be thread-related (segfault experienced with -threads 1 
>> as well as -threads 8)
>> - not related to nbest-list construction (we first had this problem in mert 
>> tuning so we isolated this)
>> - not related to more LMs (we first had several LMs in the setup, we get the 
>> crash with just one as well)
>> - not related to -search, the bug is there with -search set to 0, 1 or 4
>> - seems related to data or data size: when we trained the first ttable on 
>> just a very small corpus, we did not get the segfault (yet)
>> - not related to translation options caching, the bug is there even with 
>> -no-cache
>> - not related to the specification of output-factors; left unspecified or 
>> set to 0<CR>1, the bug is there
>>
>>
>> Here is the moses.ini:
>>
>> [input-factors]
>> 0
>>
>> [mapping]
>> 0 T 0
>> 1 T 1
>>
>> [distortion-limit]
>> 6
>>
>> [feature]
>> Distortion
>> KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4
>> PhraseDictionaryCompact name=TranslationModel0 num-features=4 
>> path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100
>> PhraseDictionaryCompact name=TranslationModel1 num-features=4 
>> path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100
>> PhrasePenalty
>> UnknownWordPenalty
>> WordPenalty
>>
>> [weight]
>> Distortion0= 0.3
>> LM0= 0.5
>> PhrasePenalty0= 0.2
>> TranslationModel0= 0.2 0.2 0.2 0.2
>> TranslationModel1= 0.2 0.2 0.2 0.2
>> UnknownWordPenalty0= 1
>> WordPenalty0= -1
>>
>>
>> The large setup that shows these crashes uses this big files:
>>
>> -rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm
>> -rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0,1.1.1.minphr
>> -rw-r--r-- 1 bojar ufal 5.7M Apr 13 09:24 phrase-table.0-0,1.2.1.minphr
>>
>>
>> Thanks,
>>   Ondrej.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing 
>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to