[lucy-dev] Proposal for implementation of immutable strings

Nick Wellnhofer Thu, 25 Apr 2013 09:01:36 -0700

Hello lucy-dev,

I played around with some approaches to implement immutable strings to see how 
they work out. I didn't create any real prototypes but I always found that it 
helps a lot to start writing some tentative code in order to grasp a problem. 
So here's what I came up with.


The main idea is to use a single class for normal strings, substrings, and 
"zombie" (stack-allocated) strings. This string class has the following member 
vars:

    class Clownfish::String cnick Str inherits Clownfish::Obj {
        const char *ptr;
        size_t      size;
        String     *origin;
    }

'ptr' and 'size' work like in the CharBuf implementation (we don't need a 'cap' 
field since strings are immutable). 'origin' tells what kind of string we're 
dealing with:

    * For normal strings, 'origin' is set to 'self'. This means that
      the string owns the character buffer pointed to by 'ptr'.
    * For substrings, 'origin' points to the original string that owns
      the character buffer. The refcount of 'origin' is increased on
      initialization.
    * For zombie strings, 'origin' is NULL.
    * For substrings of zombie strings, another unique value can be
      used.

So Str_Destroy would look something like this:

    void
    Str_destroy(String *self) {
        if (self->origin == NULL) {
            THROW(ERR, "Can't destroy a stack-allocated String");
        }
        if (self->origin == self) {
            FREEMEM(self->ptr);
        }
        else if (self->origin != SUBSTRING_OF_ZOMBIE) {
            // Substring.
            DECREF(self->origin);
        }
        SUPER_DESTROY(self, STRING);
    }

For zombie strings, it's assumed that they don't have to care about the 
lifetime of the character buffer. So there are two cases left out: 
stack-allocated strings that own a buffer and stack-allocated substrings of a 
normal string. Both of these would need to be DECREFed and handled in the 
destructor but I don't think we need them.

Then another unrelated question turned up. Originally, I planned to make 
Clownfish::String abstract, and implement different encodings in 
Clownfish::UTF8String, etc. But it's also possible to implement the UTF-8 
encoding directly in Clownfish::String. This might make sense because UTF-8 
will be used in all but a few cases.

Nick

[lucy-dev] Proposal for implementation of immutable strings

Reply via email to