Re: [Felix-language] [felix] weird bug

john skaller Thu, 04 Jul 2013 05:01:57 -0700

On 04/07/2013, at 9:39 AM, john skaller wrote:

> I must be missing something but I have this code using JudySLIns:




Oooo.. a nasty bug indeed! I have found it at the C++ level. Here's the C++:

  { 
   ....
      if (::std::strlen(_tmp65801) >= JUDY_SL_MAXLEN) throw "JudySLIns 
strlen>10000";
      *(Word_t**)_tmp65803=(Word_t*)JudySLIns((PTF index),(unsigned 
char*)_tmp65801,_tmp65802);
    
      }
      *slot  = (void**)new(*PTF gcp, ::flx::rtl::_address_ptr_map, true) 
void* (FLX_VNR(1, new(*PTF gcp, _tt65465_ptr_map, true)
 _tt65465 (_tt65465(PTF data, 
_i52574_f52574_get_dflt__apos_2(FLX_FPAR_PASS this)
      .apply(_tt65466(PTF sym, (void* /*nullptr*/ )0)))))); //assign


No, you do not need to understand this code at all to see the bug!!

Here's some diagnostics I inserted:

~/felix>build/release/host/bin/flx --test=build/release --nofelix --force 
src/tools/flx_libindex
__init__.flx
address.flx
(address.flx, 17, isNULL, fun isNULL: address -> bool)
Insert: Slot address = 0x1001321d0, value = 0x0
Get default apply method key = isNULL
Calling JudySLGet
Result is slot address 0x1001321d0
Shell terminated by signal SIGSEGV


Here's the Felix:

        index.add sym (Cons (data,index.get_dflt(sym,Empty[data_t])));

Now, the data is symbol/position/definition record for the documentation
index. The Judy array maps the symbol name to the position which is converted to
an HTML link.

Since the same name can be defined many times in code, the JudySL array data
is a list of the definitions. What this function is supposed to do is store 
into an
the Judy array under the symbol name key, the data consisting of the old data,
which is a list, with the new definition pushed onto the head of the list.

The 'add' function returns a pointer to the data slot in the array so you can
store data there. If the slot exists, it returns its address, otherwise it 
creates
a new entry in the array and returns its address.

So what we want to put in that slot is the old data with the new data Cons'ed
on the front. If there is no old data because this is a first entry for the 
symbol,
we just set the old data to an empty list. The get_dflt function does that:
it does a fetch and returns the stored value if there is one, or the default
given if there isn't.

It's a common trick to maintain a list under a key. And there's
nothing wrong with that Felix code, its correct.

But NOW you can see the problem in the C++. Look again:

INSERT comes first.
THEN the get default.

Unfortunately, the insert created a slot with a NULL pointer in it,
and so the get fetched it, thinking it was a list pointer, and then
the Cons caused a segfault (or something).

Obviously we have to calculate the new value first BEFORE
doing the insert.

So is this a bug in the compiler??

Here's the add function:

   proc add (x:strdict) (key:string) (value: T) {
     var err: JError_t;
     var slot : && T;
     JudySLIns (_repr_ x, key.cstr, &err, C_hack::cast[&&word] (&slot));
     slot <- new value;
   }

Do you see the problem now? 

Hint: add is inlined. And value is lazily evaluated. The problem is our
evaluation of value does a lookup on the 

   proc add (x:strdict) (var key:string) (var value: T) { 
     var err: JError_t;
     var slot : && T; 
     JudySLIns (_repr_ x, key.cstr, &err, C_hack::cast[&&word] (&slot));
     slot <- new value;
   }

This fixes it. However it is NOT clear why the programmer should have to
write this. The formula for the value is defined as a function (not a 
generator).

  fun get_dflt (x:strdict) (key:string, dflt:T) => 
    match get x key with
    | Some ?v => v
    | None => dflt
    endmatch
  ;

The problem is that whilst this is an "accessor"  that depends on state
(namely the state of the Judy array) it doesn't actually have any side effects.

Never the less the call needed to be lifted out or eagerly evaluated.
Making get_dflt a generator would have fixed it too.

This is, in fact, a subtle problem in the semantics of the kind Dobes 
and Srean worried about.

Just to be clear this is not a "failure to understand" the evaluation strategy
and use it properly. The problem is that there's no reason to make
get_dflt a generator, or any reason the add procedure shouldn't use
lazy evaluation: neither function could predict its argument would
interact via an alias (namely the Judy array).

What we actually needed was to observe:

        (a) the argument depends on mutable state of "index"
        (b) the function modifies the state of "index"

and so the ordering matters. We need some kind of barrier.

Now, before you think "Oh another bug with indeterminate evaluation"
I want to disabuse you of that notion. It's a universal problem with
ALL procedural languages. Here's a variant which impacts both
Ocaml and C++:

        Hashtbl.iter (fun (key, value) -> Hashtbl.insert H (key+1) value) H;

and you can see a similar problem with STL iterators: if the closure
function modifies the data structure being scanned all hell breaks loose.
It happens for the same reason as in Felix: callbacks are intrinsically
a representation of lazy evaluation.

in these cases, it is NOT clear to me at least who is responsible.
The programmer writing the callback and the one writing the HOF
didn't realise the HOF might scan the same data structure the callback
is modifying.

So the responsibility MUST be with the programmer that puts them
together.

Which means this line of code I wrote:

        index.add sym (Cons (data,index.get_dflt(sym,Empty[data_t])));

is the bug. Neither add nor get_dflt is wrong, nor is the lazy evaluation
wrong. the fix, forcing the order like this:

        var old_data =index.get_dflt(sym,Empty[data_t]);
        index.add sym (Cons (data, old_data));

not only works, its the correct fix because there's no reason to 
fix either the get_dflt or add functions. Another way is:

        index.add sym (var Cons (data,index.get_dflt(sym,Empty[data_t])));

[Note the "var" there]

In C land we would say, functions are *entitled* to assume that
pointer arguments aren't aliases**. Otherwise a lot of functions
simply couldn't work (or couldn't work efficiently). Strcpy for example
would be slow (because you'd have to find the string length just
to check if there's any overlap).

** Note this is not the case for many functions which is why C is slower
than Fortran, which does make that requirement. ISO C99 introduced
"restrict" keyword to fix this.

I think this particular bug has a name "the unexpected coupling problem".

--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Re: [Felix-language] [felix] weird bug

Reply via email to