Hello lucy-dev,
I played around with some approaches to implement immutable strings to see how
they work out. I didn't create any real prototypes but I always found that it
helps a lot to start writing some tentative code in order to grasp a problem.
So here's what I came up with.
The main idea is to use a single class for normal strings, substrings, and
"zombie" (stack-allocated) strings. This string class has the following member
vars:
class Clownfish::String cnick Str inherits Clownfish::Obj {
const char *ptr;
size_t size;
String *origin;
}
'ptr' and 'size' work like in the CharBuf implementation (we don't need a 'cap'
field since strings are immutable). 'origin' tells what kind of string we're
dealing with:
* For normal strings, 'origin' is set to 'self'. This means that
the string owns the character buffer pointed to by 'ptr'.
* For substrings, 'origin' points to the original string that owns
the character buffer. The refcount of 'origin' is increased on
initialization.
* For zombie strings, 'origin' is NULL.
* For substrings of zombie strings, another unique value can be
used.
So Str_Destroy would look something like this:
void
Str_destroy(String *self) {
if (self->origin == NULL) {
THROW(ERR, "Can't destroy a stack-allocated String");
}
if (self->origin == self) {
FREEMEM(self->ptr);
}
else if (self->origin != SUBSTRING_OF_ZOMBIE) {
// Substring.
DECREF(self->origin);
}
SUPER_DESTROY(self, STRING);
}
For zombie strings, it's assumed that they don't have to care about the
lifetime of the character buffer. So there are two cases left out:
stack-allocated strings that own a buffer and stack-allocated substrings of a
normal string. Both of these would need to be DECREFed and handled in the
destructor but I don't think we need them.
Then another unrelated question turned up. Originally, I planned to make
Clownfish::String abstract, and implement different encodings in
Clownfish::UTF8String, etc. But it's also possible to implement the UTF-8
encoding directly in Clownfish::String. This might make sense because UTF-8
will be used in all but a few cases.
Nick