I'm not aware of any changes... I'll try to create a minimal example
(ttables + moses.ini) so that you can reproduce it on your machine
(hopefully it's reproducible :)).

Best,
Ales

On Thu, Apr 14, 2016 at 4:10 PM, Marcin Junczys-Dowmunt <[email protected]>
wrote:

> Hm, this is not something I can fix out of my head. It's actually weird
> this is happening at all, is there anything unusual with those machines?
> New compiler?
>
> W dniu 14.04.2016 o 15:08, Aleš Tamchyna pisze:
>
> Hi Marcin,
>
> I ran it several times and the backtrace looked identical in all cases:
>
> #0  0x000000000081fd18 in Moses::BitWrapper<std::string>::Seek
> (this=0x7ffc23a5c7f0,
>     bitPos=18446744073708512633) at
> moses/TranslationModel/CompactPT/CanonicalHuffman.h:303
> #1  0x000000000081dd78 in Moses::BitWrapper<std::string>::SeekFromEnd
> (this=0x7ffc23a5c7f0,
>     bitPosFromEnd=1039935) at
> moses/TranslationModel/CompactPT/CanonicalHuffman.h:309
> #2  0x000000000081a4ca in
> Moses::PhraseDecoder::CreateTargetPhraseCollection (
>     this=0xbac1650, sourcePhrase=..., topLevel=true, eval=true)
>     at moses/TranslationModel/CompactPT/PhraseDecoder.cpp:233
>
> But the crash itself happens completely randomly...
>
> I don't think you can access our machines anymore, sorry :( The account
> was already deleted.
>
> Best,
> Ales
>
> On Wed, Apr 13, 2016 at 10:10 PM, Marcin Junczys-Dowmunt <
> <[email protected]>[email protected]> wrote:
>
>> Urghs, not good. Can I somehow get access to that machine? Is it
>> deterministic?
>>
>> W dniu 13.04.2016 o 21:06, Barry Haddow pisze:
>>
>> Hi Ales
>>
>> Well, bitPos=18446744073708512633  looks bogus. Marcin?
>>
>> cheers - Barry
>>
>> On 13/04/16 17:23, Aleš Tamchyna wrote:
>>
>> Hi all,
>>
>> sorry for the delay. I'm attaching the debug backtrace.
>>
>> Best,
>> Ales
>>
>> On Wed, Apr 13, 2016 at 1:49 PM, Barry Haddow <
>> <[email protected]>[email protected]> wrote:
>>
>>> Hi
>>>
>>> The backtrace would be more informative if you run with a debug build
>>> (add variant=debug to bjam). Sometimes this makes bugs go away, or new bugs
>>> appear, but if not then it will give more information. You can run with
>>> core files enabled (ulimit -c unlimited) to save having to run Moses inside
>>> gdb.
>>>
>>> If the bug is random, but not thread related, then it could well be
>>> memory corruption. Running Moses in valgrind can help track this down
>>> (again, using a debug build is better). Note that the suffix arrays crash
>>> valgrind (last time I checked) so don't build them in,
>>>
>>> cheers - Barry
>>>
>>>
>>> On 13/04/16 11:25, Ales Tamchyna wrote:
>>>
>>> Hi,
>>> Let me add some more information to this: when running Moses in gdb, I get 
>>> the following backtrace:
>>> #0  0x00000000006e3ba4 in 
>>> Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, 
>>> bool, bool) ()
>>> #1  0x00000000005cd2a7 in 
>>> Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase
>>>  const&) const ()
>>> #2  0x000000000048efe4 in 
>>> Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase 
>>> const&) const ()
>>> #3  0x000000000048e6a0 in 
>>> Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector<Moses::InputPath*,
>>>  std::allocator<Moses::InputPath*> > const&) const ()
>>> #4  0x0000000000560948 in 
>>> Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() ()
>>> #5  0x0000000000551a39 in 
>>> Moses::TranslationOptionCollectionText::CreateTranslationOptions() ()
>>> #6  0x00000000004bddfc in Moses::Manager::Decode() ()
>>> #7  0x0000000000433bd4 in Moses::TranslationTask::Run() ()
>>> #8  0x0000000000496088 in Moses::ThreadPool::Execute() ()
>>> #9  0x00000000007cbdba in thread_proxy ()
>>> #10 0x00007fffc210c182 in start_thread (arg=0x7ffc23a5d700) at 
>>> pthread_create.c:312
>>> #11 0x00007fffc1e3947d in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>> This suggests the problem is somewhere in loading phrase translations from 
>>> the compact phrase table.
>>> I’m not sure why the LEGACY functions are called but I’m assuming that 
>>> these are “future” legacy methods and that they are in fact still used by 
>>> phrase dictionary implementations (?).
>>> Best,
>>> Ales
>>> From: Ondrej Bojar
>>> Sent: středa 13. dubna 2016 12:19
>>> To: [email protected]
>>> Cc: Roman Sudarikov; Ales Tamchyna
>>> Subject: Random segfaults with alternative decoding paths
>>>
>>>
>>> Hi,
>>>
>>> we're experiencing random segfaults when we use two phrase tables in 
>>> alternative decoding paths. The exact commit of moses we use is 
>>> 6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1.
>>>
>>> We do have test runs on the exact same 200 input sentences, exact same 
>>> moses.ini, on the very same machine, where one of the runs succeeds and the 
>>> other dies after 45 sentences.
>>>
>>>
>>> Would anyone have any idea what should we be chasing?
>>>
>>> - it doesn't seem to be thread-related (segfault experienced with -threads 
>>> 1 as well as -threads 8)
>>> - not related to nbest-list construction (we first had this problem in mert 
>>> tuning so we isolated this)
>>> - not related to more LMs (we first had several LMs in the setup, we get 
>>> the crash with just one as well)
>>> - not related to -search, the bug is there with -search set to 0, 1 or 4
>>> - seems related to data or data size: when we trained the first ttable on 
>>> just a very small corpus, we did not get the segfault (yet)
>>> - not related to translation options caching, the bug is there even with 
>>> -no-cache
>>> - not related to the specification of output-factors; left unspecified or 
>>> set to 0<CR>1, the bug is there
>>>
>>>
>>> Here is the moses.ini:
>>>
>>> [input-factors]
>>> 0
>>>
>>> [mapping]
>>> 0 T 0
>>> 1 T 1
>>>
>>> [distortion-limit]
>>> 6
>>>
>>> [feature]
>>> Distortion
>>> KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4
>>> PhraseDictionaryCompact name=TranslationModel0 num-features=4 
>>> path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100
>>> PhraseDictionaryCompact name=TranslationModel1 num-features=4 
>>> path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100
>>> PhrasePenalty
>>> UnknownWordPenalty
>>> WordPenalty
>>>
>>> [weight]
>>> Distortion0= 0.3
>>> LM0= 0.5
>>> PhrasePenalty0= 0.2
>>> TranslationModel0= 0.2 0.2 0.2 0.2
>>> TranslationModel1= 0.2 0.2 0.2 0.2
>>> UnknownWordPenalty0= 1
>>> WordPenalty0= -1
>>>
>>>
>>> The large setup that shows these crashes uses this big files:
>>>
>>> -rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm
>>> -rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0,1.1.1.minphr
>>> -rw-r--r-- 1 bojar ufal 5.7M Apr 13 09:24 phrase-table.0-0,1.2.1.minphr
>>>
>>>
>>> Thanks,
>>>   Ondrej.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing 
>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing 
>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to