I'm not aware of any changes... I'll try to create a minimal example (ttables + moses.ini) so that you can reproduce it on your machine (hopefully it's reproducible :)).
Best, Ales On Thu, Apr 14, 2016 at 4:10 PM, Marcin Junczys-Dowmunt <[email protected]> wrote: > Hm, this is not something I can fix out of my head. It's actually weird > this is happening at all, is there anything unusual with those machines? > New compiler? > > W dniu 14.04.2016 o 15:08, Aleš Tamchyna pisze: > > Hi Marcin, > > I ran it several times and the backtrace looked identical in all cases: > > #0 0x000000000081fd18 in Moses::BitWrapper<std::string>::Seek > (this=0x7ffc23a5c7f0, > bitPos=18446744073708512633) at > moses/TranslationModel/CompactPT/CanonicalHuffman.h:303 > #1 0x000000000081dd78 in Moses::BitWrapper<std::string>::SeekFromEnd > (this=0x7ffc23a5c7f0, > bitPosFromEnd=1039935) at > moses/TranslationModel/CompactPT/CanonicalHuffman.h:309 > #2 0x000000000081a4ca in > Moses::PhraseDecoder::CreateTargetPhraseCollection ( > this=0xbac1650, sourcePhrase=..., topLevel=true, eval=true) > at moses/TranslationModel/CompactPT/PhraseDecoder.cpp:233 > > But the crash itself happens completely randomly... > > I don't think you can access our machines anymore, sorry :( The account > was already deleted. > > Best, > Ales > > On Wed, Apr 13, 2016 at 10:10 PM, Marcin Junczys-Dowmunt < > <[email protected]>[email protected]> wrote: > >> Urghs, not good. Can I somehow get access to that machine? Is it >> deterministic? >> >> W dniu 13.04.2016 o 21:06, Barry Haddow pisze: >> >> Hi Ales >> >> Well, bitPos=18446744073708512633 looks bogus. Marcin? >> >> cheers - Barry >> >> On 13/04/16 17:23, Aleš Tamchyna wrote: >> >> Hi all, >> >> sorry for the delay. I'm attaching the debug backtrace. >> >> Best, >> Ales >> >> On Wed, Apr 13, 2016 at 1:49 PM, Barry Haddow < >> <[email protected]>[email protected]> wrote: >> >>> Hi >>> >>> The backtrace would be more informative if you run with a debug build >>> (add variant=debug to bjam). Sometimes this makes bugs go away, or new bugs >>> appear, but if not then it will give more information. You can run with >>> core files enabled (ulimit -c unlimited) to save having to run Moses inside >>> gdb. >>> >>> If the bug is random, but not thread related, then it could well be >>> memory corruption. Running Moses in valgrind can help track this down >>> (again, using a debug build is better). Note that the suffix arrays crash >>> valgrind (last time I checked) so don't build them in, >>> >>> cheers - Barry >>> >>> >>> On 13/04/16 11:25, Ales Tamchyna wrote: >>> >>> Hi, >>> Let me add some more information to this: when running Moses in gdb, I get >>> the following backtrace: >>> #0 0x00000000006e3ba4 in >>> Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, >>> bool, bool) () >>> #1 0x00000000005cd2a7 in >>> Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase >>> const&) const () >>> #2 0x000000000048efe4 in >>> Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase >>> const&) const () >>> #3 0x000000000048e6a0 in >>> Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector<Moses::InputPath*, >>> std::allocator<Moses::InputPath*> > const&) const () >>> #4 0x0000000000560948 in >>> Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() () >>> #5 0x0000000000551a39 in >>> Moses::TranslationOptionCollectionText::CreateTranslationOptions() () >>> #6 0x00000000004bddfc in Moses::Manager::Decode() () >>> #7 0x0000000000433bd4 in Moses::TranslationTask::Run() () >>> #8 0x0000000000496088 in Moses::ThreadPool::Execute() () >>> #9 0x00000000007cbdba in thread_proxy () >>> #10 0x00007fffc210c182 in start_thread (arg=0x7ffc23a5d700) at >>> pthread_create.c:312 >>> #11 0x00007fffc1e3947d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 >>> This suggests the problem is somewhere in loading phrase translations from >>> the compact phrase table. >>> I’m not sure why the LEGACY functions are called but I’m assuming that >>> these are “future” legacy methods and that they are in fact still used by >>> phrase dictionary implementations (?). >>> Best, >>> Ales >>> From: Ondrej Bojar >>> Sent: středa 13. dubna 2016 12:19 >>> To: [email protected] >>> Cc: Roman Sudarikov; Ales Tamchyna >>> Subject: Random segfaults with alternative decoding paths >>> >>> >>> Hi, >>> >>> we're experiencing random segfaults when we use two phrase tables in >>> alternative decoding paths. The exact commit of moses we use is >>> 6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1. >>> >>> We do have test runs on the exact same 200 input sentences, exact same >>> moses.ini, on the very same machine, where one of the runs succeeds and the >>> other dies after 45 sentences. >>> >>> >>> Would anyone have any idea what should we be chasing? >>> >>> - it doesn't seem to be thread-related (segfault experienced with -threads >>> 1 as well as -threads 8) >>> - not related to nbest-list construction (we first had this problem in mert >>> tuning so we isolated this) >>> - not related to more LMs (we first had several LMs in the setup, we get >>> the crash with just one as well) >>> - not related to -search, the bug is there with -search set to 0, 1 or 4 >>> - seems related to data or data size: when we trained the first ttable on >>> just a very small corpus, we did not get the segfault (yet) >>> - not related to translation options caching, the bug is there even with >>> -no-cache >>> - not related to the specification of output-factors; left unspecified or >>> set to 0<CR>1, the bug is there >>> >>> >>> Here is the moses.ini: >>> >>> [input-factors] >>> 0 >>> >>> [mapping] >>> 0 T 0 >>> 1 T 1 >>> >>> [distortion-limit] >>> 6 >>> >>> [feature] >>> Distortion >>> KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4 >>> PhraseDictionaryCompact name=TranslationModel0 num-features=4 >>> path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100 >>> PhraseDictionaryCompact name=TranslationModel1 num-features=4 >>> path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100 >>> PhrasePenalty >>> UnknownWordPenalty >>> WordPenalty >>> >>> [weight] >>> Distortion0= 0.3 >>> LM0= 0.5 >>> PhrasePenalty0= 0.2 >>> TranslationModel0= 0.2 0.2 0.2 0.2 >>> TranslationModel1= 0.2 0.2 0.2 0.2 >>> UnknownWordPenalty0= 1 >>> WordPenalty0= -1 >>> >>> >>> The large setup that shows these crashes uses this big files: >>> >>> -rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm >>> -rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0,1.1.1.minphr >>> -rw-r--r-- 1 bojar ufal 5.7M Apr 13 09:24 phrase-table.0-0,1.2.1.minphr >>> >>> >>> Thanks, >>> Ondrej. >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing >>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> >> _______________________________________________ >> Moses-support mailing >> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
