Hm, this is not something I can fix out of my head. It's actually weird this is happening at all, is there anything unusual with those machines? New compiler?

W dniu 14.04.2016 o 15:08, Aleš Tamchyna pisze:
Hi Marcin,

I ran it several times and the backtrace looked identical in all cases:

#0 0x000000000081fd18 in Moses::BitWrapper<std::string>::Seek (this=0x7ffc23a5c7f0, bitPos=18446744073708512633) at moses/TranslationModel/CompactPT/CanonicalHuffman.h:303 #1 0x000000000081dd78 in Moses::BitWrapper<std::string>::SeekFromEnd (this=0x7ffc23a5c7f0, bitPosFromEnd=1039935) at moses/TranslationModel/CompactPT/CanonicalHuffman.h:309 #2 0x000000000081a4ca in Moses::PhraseDecoder::CreateTargetPhraseCollection (
    this=0xbac1650, sourcePhrase=..., topLevel=true, eval=true)
    at moses/TranslationModel/CompactPT/PhraseDecoder.cpp:233

But the crash itself happens completely randomly...

I don't think you can access our machines anymore, sorry :( The account was already deleted.

Best,
Ales

On Wed, Apr 13, 2016 at 10:10 PM, Marcin Junczys-Dowmunt <[email protected] <mailto:[email protected]>> wrote:

    Urghs, not good. Can I somehow get access to that machine? Is it
    deterministic?

    W dniu 13.04.2016 o 21:06, Barry Haddow pisze:
    Hi Ales

    Well, bitPos=18446744073708512633  looks bogus. Marcin?

    cheers - Barry

    On 13/04/16 17:23, Aleš Tamchyna wrote:
    Hi all,

    sorry for the delay. I'm attaching the debug backtrace.

    Best,
    Ales

    On Wed, Apr 13, 2016 at 1:49 PM, Barry Haddow
    <[email protected] <mailto:[email protected]>>
    wrote:

        Hi

        The backtrace would be more informative if you run with a
        debug build (add variant=debug to bjam). Sometimes this
        makes bugs go away, or new bugs appear, but if not then it
        will give more information. You can run with core files
        enabled (ulimit -c unlimited) to save having to run Moses
        inside gdb.

        If the bug is random, but not thread related, then it could
        well be memory corruption. Running Moses in valgrind can
        help track this down (again, using a debug build is better).
        Note that the suffix arrays crash valgrind (last time I
        checked) so don't build them in,

        cheers - Barry


        On 13/04/16 11:25, Ales Tamchyna wrote:
        Hi,
        Let me add some more information to this: when running Moses in gdb, I 
get the following backtrace:
        #0  0x00000000006e3ba4 in 
Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, bool, 
bool) ()
        #1  0x00000000005cd2a7 in 
Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase
 const&) const ()
        #2  0x000000000048efe4 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase const&) 
const ()
        #3  0x000000000048e6a0 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector<Moses::InputPath*, 
std::allocator<Moses::InputPath*> > const&) const ()
        #4  0x0000000000560948 in 
Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() ()
        #5  0x0000000000551a39 in 
Moses::TranslationOptionCollectionText::CreateTranslationOptions() ()
        #6  0x00000000004bddfc in Moses::Manager::Decode() ()
        #7  0x0000000000433bd4 in Moses::TranslationTask::Run() ()
        #8  0x0000000000496088 in Moses::ThreadPool::Execute() ()
        #9  0x00000000007cbdba in thread_proxy ()
        #10 0x00007fffc210c182 in start_thread (arg=0x7ffc23a5d700) at 
pthread_create.c:312
        #11 0x00007fffc1e3947d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
        This suggests the problem is somewhere in loading phrase translations 
from the compact phrase table.
        I’m not sure why the LEGACY functions are called but I’m assuming that 
these are “future” legacy methods and that they are in fact still used by 
phrase dictionary implementations (?).
        Best,
        Ales
        From: Ondrej Bojar
        Sent: středa 13. dubna 2016 12:19
        To:[email protected] <mailto:[email protected]>
        Cc: Roman Sudarikov; Ales Tamchyna
        Subject: Random segfaults with alternative decoding paths


        Hi,

        we're experiencing random segfaults when we use two phrase tables in 
alternative decoding paths. The exact commit of moses we use is 
6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1.

        We do have test runs on the exact same 200 input sentences, exact same 
moses.ini, on the very same machine, where one of the runs succeeds and the 
other dies after 45 sentences.


        Would anyone have any idea what should we be chasing?

        - it doesn't seem to be thread-related (segfault experienced with 
-threads 1 as well as -threads 8)
        - not related to nbest-list construction (we first had this problem in 
mert tuning so we isolated this)
        - not related to more LMs (we first had several LMs in the setup, we 
get the crash with just one as well)
        - not related to -search, the bug is there with -search set to 0, 1 or 4
        - seems related to data or data size: when we trained the first ttable 
on just a very small corpus, we did not get the segfault (yet)
        - not related to translation options caching, the bug is there even 
with -no-cache
        - not related to the specification of output-factors; left unspecified or set 
to 0<CR>1, the bug is there


        Here is the moses.ini:

        [input-factors]
        0

        [mapping]
        0 T 0
        1 T 1

        [distortion-limit]
        6

        [feature]
        Distortion
        KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4
        PhraseDictionaryCompact name=TranslationModel0 num-features=4 
path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100
        PhraseDictionaryCompact name=TranslationModel1 num-features=4 
path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100
        PhrasePenalty
        UnknownWordPenalty
        WordPenalty

        [weight]
        Distortion0= 0.3
        LM0= 0.5
        PhrasePenalty0= 0.2
        TranslationModel0= 0.2 0.2 0.2 0.2
        TranslationModel1= 0.2 0.2 0.2 0.2
        UnknownWordPenalty0= 1
        WordPenalty0= -1


        The large setup that shows these crashes uses this big files:

        -rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm
        -rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0,1.1.1.minphr
        -rw-r--r-- 1 bojar ufal 5.7M Apr 13 09:24 phrase-table.0-0,1.2.1.minphr


        Thanks,
           Ondrej.




        _______________________________________________
        Moses-support mailing list
        [email protected] <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support


        The University of Edinburgh is a charitable body, registered in
        Scotland, with registration number SC005336.

        _______________________________________________
        Moses-support mailing list
        [email protected] <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support





    The University of Edinburgh is a charitable body, registered in
    Scotland, with registration number SC005336.


    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support


    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to