Urgh. That's new. I will sit down this evening and see if I understand
my own code. Currently you seem to have a better idea what is going on
there than me :) I will answer later today. 

Marcin 

W dniu 2015-01-14 04:37, Kenneth Heafield napisaƂ(a): 

> Hi,
> 
> Now with a backtrace. Third time it's failed with 16 cores on the same
> phrase table. All runs already had "-encoding None."
> 
> #0 0x0000000000421c5d in
> Moses::Simple9::Encode<__gnu_cxx::__normal_iterator<unsigned int*,
> std::vector<unsigned int, std::allocator<unsigned int> > >,
> std::back_insert_iterator<std::vector<unsigned int,
> std::allocator<unsigned int> > > > (it=..., end=..., outIt=...,
> outIt@entry=...) at moses/TranslationModel/CompactPT/ListCoders.h:339
> #1 0x00000000004222b4 in Moses::MonotonicVector<unsigned long, unsigned
> int, 32ul, std::allocator>::push_back (this=this@entry=0xbebe3258,
> i=3540308603) at moses/TranslationModel/CompactPT/MonotonicVector.h:109
> #2 0x000000000042d344 in Moses::StringVector<unsigned char, unsigned
> long, Moses::MmapAllocator>::push_back<std::string> (this=0xbebe3240,
> s=...) at moses/TranslationModel/CompactPT/StringVector.h:386
> #3 0x00000000004179a2 in FlushCompressedQueue (force=false,
> this=0x7fffffffc550) at
> moses/TranslationModel/CompactPT/PhraseTableCreator.cpp:986
> #4 Moses::CompressionTask::operator() (this=0xbebe5378) at
> moses/TranslationModel/CompactPT/PhraseTableCreator.cpp:1230
> #5 0x00000000004678ea in thread_proxy ()
> #6 0x0000003a03007851 in start_thread () from /lib64/libpthread.so.0
> #7 0x0000003a024e890d in clone () from /lib64/libc.so.6
> 
> Looking at the code:
> 
> double log2 = log(2);
> while(j < 9 && lastpos < 28 && (i+lastpos) < end) {
> if(lastpos >= parts[j])
> j++;
> 
> buffer[lastpos] = *(i + lastpos);
> 
> uint reqbit = ceil(log(buffer[lastpos]+1)/log2);
> assert(reqbit <= 28);
> 
> // CRASH HERE
> uint bit = 28/floor(28/reqbit);
> if(lastbit < bit)
> lastbit = bit;
> 
> if(parts[j] > 28/lastbit)
> break;
> else if(lastpos == parts[j]-1)
> lastyes = lastpos;
> 
> lastpos++;
> }
> 
> reqbit is 0 and 28/reqbit is triggering an integer divide by zero. Yes,
> floating point exception is a misnomer and usually means integer divide
> by zero, since it covers both types but NaNs are usually set to
> non-signaling.
> 
> What is the problematic line "uint bit = 28/floor(28/reqbit);" trying to
> do? Currently:
> 
> 1. Integer division 28/reqbit, returning an integer.
> 2. Cast that integer to a float.
> 3. Call floor which should do nothing at this small scale.
> 4. Floating point divide 28.0 by the result.
> 5. Convert to integer, rounding down. If the floating-point operation
> is imprecise, you'll get something lower that 28/(28/reqbit).
> 
> Moreover, it looks like there's some floating-point arithmetic to do
> integer log2.
> 
> uint reqbit = ceil(log(buffer[lastpos]+1)/log2);
> 
> How about gcc's builtin, which is one asm instruction (if gcc is the
> compiler)?
> 
> int __builtin_clz (unsigned int x)
> 
> But anyway buffer[lastpos] == 0 so the above integer log2 code is
> correctly returning 0 == log2(0 + 1)
> 
> Tracing back a bit more, the function is attempting to encode a vector
> containing the following integers: 0 118 128 72 63 71 64 114 41 74 46
> 375 374 425 112 502 496 485 474 493 106 110 104 110 115 296 287 105 113
> 0 0 . It's barfing on the 0th entry in that vector, which is a zero.
> 
> Sometimes Simple-9 doesn't expect 0s since it's delta encoding for
> posting lists etc. Is the bug that 0s are being passed or that the
> encoding scheme isn't handling this case?
> 
> Kenneth
> 
> On 01/13/2015 02:25 AM, Marcin Junczys-Dowmunt wrote:
> Hi Kenneth. Recently I am encountering an increased number of crashes, too. I 
> guess there are some heisenbugs in the binarization that manifest maybe due 
> to a new boost version or something. A workaround is usually to use less 
> threads, only one or up to 4 (it's actually not much faster with 16 anyway). 
> If it still crashes try -encoding None . I am planning to write a new 
> binarization tool from scratch, this one is giving me too much headache. W 
> dniu 13.01.2015 o 04:20, Kenneth Heafield pisze: Dear Moses/Marcin, I'm 
> getting a Floating point exception in processPhraseTableMin from Moses 
> d0807c. Arguments, minus the absolute paths, are: processPhraseTableMin -in 
> phrase-table.gz -out phrase-table -nscores 4 -threads 16 -T /tmp -encoding 
> None The phrase table is rather large and it runs for several hours before 
> crashing. Log output is below. Used options: Text phrase table will be read 
> from: phrase-table.gz Output phrase table will be written to: 
> phrase-table.minphr Step size for source
landmark phrases: 2^10=1024 Source phrase fingerprint size: 16 bits / 
P(fp)=1.52588e-05 Selected target phrase encoding: Huffman Number of score 
components in phrase table: 4 Single Huffman code set for score components: no 
Using score quantization: no Explicitly included alignment information: yes 
Running with 16 threads Pass 1/2: Creating source phrase index + Encoding 
target phrases ..................................................[5000000] 
..................................................[10000000] 
..................................................[15000000] 
..................................................[20000000] 
..................................................[25000000] 
..................................................[30000000] 
..................................................[35000000] 
..................................................[40000000] 
..................................................[45000000] 
..................................................[50000000]
..................................................[55000000] 
..................................................[60000000] 
..................................................[65000000] 
..................................................[70000000] 
..................................................[75000000] 
..................................................[80000000] 
..................................................[85000000] 
..................................................[90000000] 
..................................................[95000000] 
..................................................[100000000] 
..................................................[105000000] 
..................................................[110000000] 
..................................................[115000000] 
..................................................[120000000] 
..................................................[125000000] 
..................................................[130000000]
..................................................[135000000] 
..................................................[140000000] 
..................................................[145000000] 
..................................................[150000000] 
..................................................[155000000] 
..................................................[160000000] 
..................................................[165000000] 
..................................................[170000000] 
..................................................[175000000] 
..................................................[180000000] 
.............................................. Intermezzo: Calculating Huffman 
code sets Creating Huffman codes for 624564 target phrase symbols Creating 
Huffman codes for 551381 scores Creating Huffman codes for 15296482 scores 
Creating Huffman codes for 582875 scores Creating Huffman codes for 15806633 
scores Creating Huffman codes for 50 alignment points Pass 2/2:
Compressing target phrases 
..................................................[5000000] 
..................................................[10000000] Kenneth 
_______________________________________________ Moses-support mailing list 
[email protected] http://mailman.mit.edu/mailman/listinfo/moses-support [1] 
_______________________________________________ Moses-support mailing list 
[email protected] http://mailman.mit.edu/mailman/listinfo/moses-support [1]

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support [1]

 

Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to