The failure of the strdict-01 regression test on Windows has
again highlighted the design fault in strings .. both Felix and C++.

When you call

        char const *p = string("hello").c_str();

in C++ you have a dangling pointer p. The call succeeds because
string("hello") is a non-const rvalue, but there's no mechanism
in C++ for a method to require an lvalue object: you can insist
that a function argument is an lvalue, but not the object of a non-static
member function.

Consequently the string("hello") temporary may evaporate immediately
after p is assigned to its internal buffer, and the destructor deletes
the storage, leaving p dangling.

In Felix, this means calling

        "Hello".cstr

is more or less guaranteed to be invalid. It is not even valid
to do this:

        fun f(x:string) = {
                g (strdup (x.cstr));
                ...

because "x" is a val, and vals can be either eagerly
or lazily evaluated. If the function above is inlined,
a call such as

        f("hello")

can reduce to

        g ( strdup ("hello".cstr) )

and the pointer can be left dangling even before strdup
is called.

Note this problem is not limited to cstr -- it applies to STL
begin() and end() too. Anything that gets a pointer into any
data structure can have the pointer invalidated if the
data structure is destroyed, and if the data structure is
a temporary that can happen very quickly.

It turns out the Microsoft compiler MSVC++ is much more
aggressive about this than gcc or clang, which presumably
are dumbed down to cope with dumb end user legacy code:
they appear to remove temporaries at the end of the containing
statement, instead of immediately after use.

Of course the converse may be possible too: gcc aggressively
removes copying whereas MS doesn't, this would lead to the
same situation because gcc would then be keeping a single value
around with the lifetime of the original extended to the copy
that it didn't make.

In any case, uise of STL iterators or cstr in Felix is affected by this
misdesign of strings in C++. I have to say here ITS MY FAULT.
I voted in favour of making strings STL containers. Only Pete Becker
stood against this and he was right.

The correct way to handle this is to copy the string buffer immediately:

        char const * get_c_string (string const &x) {
                char  const*p = x.c_str();
                return strdup (p);
        }

The reason this must work is that any temporary used as an argument
may not be destroyed until after the function returns. by that time
p is dangling but we've duplicated the buffer.

Of course in C++ this causes a memory leak. In Felix we can use
a varray instead, which is basically a garbage collected array
the contents of which are always on the heap.

Another idea is to change Felix cstr implementation to use a function
that only accepts an lvalue:

        char const *get_c_string (string &x) {
                return x.c_str();
        }

Felix will never know you called the function wrongly with
an rvalue, but the C++ compiler will barf. The delayed error
message is a bad thing .. the improved performance is a good thing.
The problem is that this isn't safe either. Even an lvalue can evaporate.

Another, safe, idea is to change the implementation of Felix strings.
Much as I'd love to design my own string class C++ compatibility
dictates that it's sane to stick with C++ strings, even if they're broken.

But, we can have our cake and eat it too: we can use a POINTER
to a C++ string, with the string on the heap. That makes strings
first class Objects in Felix instead of values though!

Alternatively we might trick the Felix compiler into assuring that
C++ strings derived from literals are universally assigned to variables
so they can't evaporate. Of course this only fixes the problems for
literals, and not string expressions in general ;(

The RIght Way (TM) to do this is:

        void get_c_string (string x, char *p, int len) {
                strncpy (p,len, x.c_str());
        }

That is, copy the string contents into a buffer .. what  I mean is,
that in C++ this function should be added and c_str() should
be removed.

Well. i don't know what to do. The universally safe solution,
using varray, incurs a horrible penalty doing a heap allocation
every time we need a char * instead of a C++ string, which
is every call to C functions. Of course I could provide both,
"safe_cstr" and "__unsafe_cstr".


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to