Hi,

        Now with a backtrace.  Third time it's failed with 16 cores on the same
phrase table.  All runs already had "-encoding None."

#0  0x0000000000421c5d in
Moses::Simple9::Encode<__gnu_cxx::__normal_iterator<unsigned int*,
std::vector<unsigned int, std::allocator<unsigned int> > >,
std::back_insert_iterator<std::vector<unsigned int,
std::allocator<unsigned int> > > > (it=..., end=..., outIt=...,
outIt@entry=...) at moses/TranslationModel/CompactPT/ListCoders.h:339
#1  0x00000000004222b4 in Moses::MonotonicVector<unsigned long, unsigned
int, 32ul, std::allocator>::push_back (this=this@entry=0xbebe3258,
i=3540308603) at moses/TranslationModel/CompactPT/MonotonicVector.h:109
#2  0x000000000042d344 in Moses::StringVector<unsigned char, unsigned
long, Moses::MmapAllocator>::push_back<std::string> (this=0xbebe3240,
s=...) at moses/TranslationModel/CompactPT/StringVector.h:386
#3  0x00000000004179a2 in FlushCompressedQueue (force=false,
this=0x7fffffffc550) at
moses/TranslationModel/CompactPT/PhraseTableCreator.cpp:986
#4  Moses::CompressionTask::operator() (this=0xbebe5378) at
moses/TranslationModel/CompactPT/PhraseTableCreator.cpp:1230
#5  0x00000000004678ea in thread_proxy ()
#6  0x0000003a03007851 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003a024e890d in clone () from /lib64/libc.so.6


Looking at the code:


      double log2 = log(2);
      while(j < 9 && lastpos < 28 && (i+lastpos) < end) {
        if(lastpos >= parts[j])
          j++;

        buffer[lastpos] = *(i + lastpos);

        uint reqbit = ceil(log(buffer[lastpos]+1)/log2);
        assert(reqbit <= 28);

        // CRASH HERE
        uint bit = 28/floor(28/reqbit);
        if(lastbit < bit)
          lastbit = bit;

        if(parts[j] > 28/lastbit)
          break;
        else if(lastpos == parts[j]-1)
          lastyes = lastpos;

        lastpos++;
      }

reqbit is 0 and 28/reqbit is triggering an integer divide by zero.  Yes,
floating point exception is a misnomer and usually means integer divide
by zero, since it covers both types but NaNs are usually set to
non-signaling.

What is the problematic line "uint bit = 28/floor(28/reqbit);" trying to
do?  Currently:

1. Integer division 28/reqbit, returning an integer.
2. Cast that integer to a float.
3. Call floor which should do nothing at this small scale.
4. Floating point divide 28.0 by the result.
5. Convert to integer, rounding down.  If the floating-point operation
is imprecise, you'll get something lower that 28/(28/reqbit).

Moreover, it looks like there's some floating-point arithmetic to do
integer log2.

uint reqbit = ceil(log(buffer[lastpos]+1)/log2);

How about gcc's builtin, which is one asm instruction (if gcc is the
compiler)?

int __builtin_clz (unsigned int x)

But anyway buffer[lastpos] == 0 so the above integer log2 code is
correctly returning 0 == log2(0 + 1)

Tracing back a bit more, the function is attempting to encode a vector
containing the following integers: 0 118 128 72 63 71 64 114 41 74 46
375 374 425 112 502 496 485 474 493 106 110 104 110 115 296 287 105 113
0 0 .  It's barfing on the 0th entry in that vector, which is a zero.

Sometimes Simple-9 doesn't expect 0s since it's delta encoding for
posting lists etc.  Is the bug that 0s are being passed or that the
encoding scheme isn't handling this case?

Kenneth


On 01/13/2015 02:25 AM, Marcin Junczys-Dowmunt wrote:
> Hi Kenneth.
> Recently I am encountering an increased number of crashes, too. I guess 
> there are some heisenbugs in the binarization that manifest maybe due to 
> a new boost version or something. A workaround is usually to use less 
> threads, only one or up to 4 (it's actually not much faster with 16 
> anyway). If it still crashes try -encoding None . I am planning to write 
> a new binarization tool from scratch, this one is giving me too much 
> headache.
> 
> W dniu 13.01.2015 o 04:20, Kenneth Heafield pisze:
>> Dear Moses/Marcin,
>>
>>      I'm getting a Floating point exception in processPhraseTableMin from
>> Moses d0807c.
>>
>> Arguments, minus the absolute paths, are:
>>
>> processPhraseTableMin -in phrase-table.gz -out phrase-table -nscores 4
>> -threads 16 -T /tmp -encoding None
>>
>> The phrase table is rather large and it runs for several hours before
>> crashing.  Log output is below.
>>
>> Used options:
>>          Text phrase table will be read from: phrase-table.gz
>>          Output phrase table will be written to: phrase-table.minphr
>>          Step size for source landmark phrases: 2^10=1024
>>          Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
>>          Selected target phrase encoding: Huffman
>>          Number of score components in phrase table: 4
>>          Single Huffman code set for score components: no
>>          Using score quantization: no
>>          Explicitly included alignment information: yes
>>          Running with 16 threads
>>
>> Pass 1/2: Creating source phrase index + Encoding target phrases
>> ..................................................[5000000]
>> ..................................................[10000000]
>> ..................................................[15000000]
>> ..................................................[20000000]
>> ..................................................[25000000]
>> ..................................................[30000000]
>> ..................................................[35000000]
>> ..................................................[40000000]
>> ..................................................[45000000]
>> ..................................................[50000000]
>> ..................................................[55000000]
>> ..................................................[60000000]
>> ..................................................[65000000]
>> ..................................................[70000000]
>> ..................................................[75000000]
>> ..................................................[80000000]
>> ..................................................[85000000]
>> ..................................................[90000000]
>> ..................................................[95000000]
>> ..................................................[100000000]
>> ..................................................[105000000]
>> ..................................................[110000000]
>> ..................................................[115000000]
>> ..................................................[120000000]
>> ..................................................[125000000]
>> ..................................................[130000000]
>> ..................................................[135000000]
>> ..................................................[140000000]
>> ..................................................[145000000]
>> ..................................................[150000000]
>> ..................................................[155000000]
>> ..................................................[160000000]
>> ..................................................[165000000]
>> ..................................................[170000000]
>> ..................................................[175000000]
>> ..................................................[180000000]
>> ..............................................
>>
>> Intermezzo: Calculating Huffman code sets
>>          Creating Huffman codes for 624564 target phrase symbols
>>          Creating Huffman codes for 551381 scores
>>          Creating Huffman codes for 15296482 scores
>>          Creating Huffman codes for 582875 scores
>>          Creating Huffman codes for 15806633 scores
>>          Creating Huffman codes for 50 alignment points
>>
>> Pass 2/2: Compressing target phrases
>> ..................................................[5000000]
>> ..................................................[10000000]
>>
>> Kenneth
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to