Re: [lucy-dev] Proposal for implementation of immutable strings

Marvin Humphrey Mon, 06 May 2013 20:03:44 -0700

On Sat, May 4, 2013 at 4:43 AM, Nick Wellnhofer <[email protected]> wrote:
> On May 4, 2013, at 01:14 , Marvin Humphrey <[email protected]> wrote:
>> OK, cool... Since it's primarily a naming change but disruptive, I'd
>> suggest the following order of actions:
>>
>> 1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations
>>     capture the returned reference.
>> 2.  Implement immutable String.
>> 3.  Make the naming change.
>
> As a side note, step 3 is quite a bit more than a naming change.


I was unclear.  The "naming change" that I was referring to was INCREF/DECREF
to CAPTURE/RELEASE.  The mutable-CharBuf-to-immutable-String change is, as you
point out, more involved than a renaming.

> All usages of CB_Nip must be converted to string iterators since CB_Nip
> mutates the string. Additionally, all the places where we construct new
> strings have to be identified. In this case we have to keep using a CharBuf
> followed by CB_Yield_String after the construction is complete.

+1

We might want to introduce String as a subclass of CharBuf with all of the
mutability disabled.  That makes it easy to break up monolithic interconnected
webs of CharBuf usage into bite-sized chunks when switching over.  Once the
transition is complete, we can remove the inheritance.

>> Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
>> for now and the Python bindings can just forego the stack-allocated-string
>> optimization until we make up our minds later.
>
> It just occurred to me that there's another problem with the INCREF
> approach. Since string iterators INCREF the source string, every zombie
> string would be copied as soon as  it's iterated. The String methods using
> zombie iterators wouldn't be affected, but it would still result in many
> unneccessary copies of host strings.

That's true, but I still imagine stack wrappers are a win in the aggregate.
(I wish I could quantify that assertion.)  It's very common to pass simple
strings: field names, hash keys, query strings, file names and file paths,
user names, URIs, etc.

> A possible solution would be to do away with stack allocation but to keep
> using the host string buffer directly. Before returning to the host
> language, we could check whether the refcount of the string is greater than
> one and copy the string only then. This scheme wouldn't require changing
> INCREF semantics.

I think that exceptions may longjmp out of Clownfish code past the host
wrapper, though.  Right now if that happens we can leak memory -- a newly
allocated Clownfish object needed for calling into Clownfish code may not get
decremented.  If we delay copying we might start getting memory read errors,
though.

Marvin Humphrey

Re: [lucy-dev] Proposal for implementation of immutable strings

Reply via email to