On 04/07/2013, at 9:39 AM, john skaller wrote: > I must be missing something but I have this code using JudySLIns:
Oooo.. a nasty bug indeed! I have found it at the C++ level. Here's the C++: { .... if (::std::strlen(_tmp65801) >= JUDY_SL_MAXLEN) throw "JudySLIns strlen>10000"; *(Word_t**)_tmp65803=(Word_t*)JudySLIns((PTF index),(unsigned char*)_tmp65801,_tmp65802); } *slot = (void**)new(*PTF gcp, ::flx::rtl::_address_ptr_map, true) void* (FLX_VNR(1, new(*PTF gcp, _tt65465_ptr_map, true) _tt65465 (_tt65465(PTF data, _i52574_f52574_get_dflt__apos_2(FLX_FPAR_PASS this) .apply(_tt65466(PTF sym, (void* /*nullptr*/ )0)))))); //assign No, you do not need to understand this code at all to see the bug!! Here's some diagnostics I inserted: ~/felix>build/release/host/bin/flx --test=build/release --nofelix --force src/tools/flx_libindex __init__.flx address.flx (address.flx, 17, isNULL, fun isNULL: address -> bool) Insert: Slot address = 0x1001321d0, value = 0x0 Get default apply method key = isNULL Calling JudySLGet Result is slot address 0x1001321d0 Shell terminated by signal SIGSEGV Here's the Felix: index.add sym (Cons (data,index.get_dflt(sym,Empty[data_t]))); Now, the data is symbol/position/definition record for the documentation index. The Judy array maps the symbol name to the position which is converted to an HTML link. Since the same name can be defined many times in code, the JudySL array data is a list of the definitions. What this function is supposed to do is store into an the Judy array under the symbol name key, the data consisting of the old data, which is a list, with the new definition pushed onto the head of the list. The 'add' function returns a pointer to the data slot in the array so you can store data there. If the slot exists, it returns its address, otherwise it creates a new entry in the array and returns its address. So what we want to put in that slot is the old data with the new data Cons'ed on the front. If there is no old data because this is a first entry for the symbol, we just set the old data to an empty list. The get_dflt function does that: it does a fetch and returns the stored value if there is one, or the default given if there isn't. It's a common trick to maintain a list under a key. And there's nothing wrong with that Felix code, its correct. But NOW you can see the problem in the C++. Look again: INSERT comes first. THEN the get default. Unfortunately, the insert created a slot with a NULL pointer in it, and so the get fetched it, thinking it was a list pointer, and then the Cons caused a segfault (or something). Obviously we have to calculate the new value first BEFORE doing the insert. So is this a bug in the compiler?? Here's the add function: proc add (x:strdict) (key:string) (value: T) { var err: JError_t; var slot : && T; JudySLIns (_repr_ x, key.cstr, &err, C_hack::cast[&&word] (&slot)); slot <- new value; } Do you see the problem now? Hint: add is inlined. And value is lazily evaluated. The problem is our evaluation of value does a lookup on the proc add (x:strdict) (var key:string) (var value: T) { var err: JError_t; var slot : && T; JudySLIns (_repr_ x, key.cstr, &err, C_hack::cast[&&word] (&slot)); slot <- new value; } This fixes it. However it is NOT clear why the programmer should have to write this. The formula for the value is defined as a function (not a generator). fun get_dflt (x:strdict) (key:string, dflt:T) => match get x key with | Some ?v => v | None => dflt endmatch ; The problem is that whilst this is an "accessor" that depends on state (namely the state of the Judy array) it doesn't actually have any side effects. Never the less the call needed to be lifted out or eagerly evaluated. Making get_dflt a generator would have fixed it too. This is, in fact, a subtle problem in the semantics of the kind Dobes and Srean worried about. Just to be clear this is not a "failure to understand" the evaluation strategy and use it properly. The problem is that there's no reason to make get_dflt a generator, or any reason the add procedure shouldn't use lazy evaluation: neither function could predict its argument would interact via an alias (namely the Judy array). What we actually needed was to observe: (a) the argument depends on mutable state of "index" (b) the function modifies the state of "index" and so the ordering matters. We need some kind of barrier. Now, before you think "Oh another bug with indeterminate evaluation" I want to disabuse you of that notion. It's a universal problem with ALL procedural languages. Here's a variant which impacts both Ocaml and C++: Hashtbl.iter (fun (key, value) -> Hashtbl.insert H (key+1) value) H; and you can see a similar problem with STL iterators: if the closure function modifies the data structure being scanned all hell breaks loose. It happens for the same reason as in Felix: callbacks are intrinsically a representation of lazy evaluation. in these cases, it is NOT clear to me at least who is responsible. The programmer writing the callback and the one writing the HOF didn't realise the HOF might scan the same data structure the callback is modifying. So the responsibility MUST be with the programmer that puts them together. Which means this line of code I wrote: index.add sym (Cons (data,index.get_dflt(sym,Empty[data_t]))); is the bug. Neither add nor get_dflt is wrong, nor is the lazy evaluation wrong. the fix, forcing the order like this: var old_data =index.get_dflt(sym,Empty[data_t]); index.add sym (Cons (data, old_data)); not only works, its the correct fix because there's no reason to fix either the get_dflt or add functions. Another way is: index.add sym (var Cons (data,index.get_dflt(sym,Empty[data_t]))); [Note the "var" there] In C land we would say, functions are *entitled* to assume that pointer arguments aren't aliases**. Otherwise a lot of functions simply couldn't work (or couldn't work efficiently). Strcpy for example would be slow (because you'd have to find the string length just to check if there's any overlap). ** Note this is not the case for many functions which is why C is slower than Fortran, which does make that requirement. ISO C99 introduced "restrict" keyword to fix this. I think this particular bug has a name "the unexpected coupling problem". -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language