On Sat, May 4, 2013 at 4:43 AM, Nick Wellnhofer <[email protected]> wrote:
> On May 4, 2013, at 01:14 , Marvin Humphrey <[email protected]> wrote:
>> OK, cool... Since it's primarily a naming change but disruptive, I'd
>> suggest the following order of actions:
>>
>> 1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations
>>     capture the returned reference.
>> 2.  Implement immutable String.
>> 3.  Make the naming change.
>
> As a side note, step 3 is quite a bit more than a naming change.

I was unclear.  The "naming change" that I was referring to was INCREF/DECREF
to CAPTURE/RELEASE.  The mutable-CharBuf-to-immutable-String change is, as you
point out, more involved than a renaming.

> All usages of CB_Nip must be converted to string iterators since CB_Nip
> mutates the string. Additionally, all the places where we construct new
> strings have to be identified. In this case we have to keep using a CharBuf
> followed by CB_Yield_String after the construction is complete.

+1

We might want to introduce String as a subclass of CharBuf with all of the
mutability disabled.  That makes it easy to break up monolithic interconnected
webs of CharBuf usage into bite-sized chunks when switching over.  Once the
transition is complete, we can remove the inheritance.

>> Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
>> for now and the Python bindings can just forego the stack-allocated-string
>> optimization until we make up our minds later.
>
> It just occurred to me that there's another problem with the INCREF
> approach. Since string iterators INCREF the source string, every zombie
> string would be copied as soon as  it's iterated. The String methods using
> zombie iterators wouldn't be affected, but it would still result in many
> unneccessary copies of host strings.

That's true, but I still imagine stack wrappers are a win in the aggregate.
(I wish I could quantify that assertion.)  It's very common to pass simple
strings: field names, hash keys, query strings, file names and file paths,
user names, URIs, etc.

> A possible solution would be to do away with stack allocation but to keep
> using the host string buffer directly. Before returning to the host
> language, we could check whether the refcount of the string is greater than
> one and copy the string only then. This scheme wouldn't require changing
> INCREF semantics.

I think that exceptions may longjmp out of Clownfish code past the host
wrapper, though.  Right now if that happens we can leak memory -- a newly
allocated Clownfish object needed for calling into Clownfish code may not get
decremented.  If we delay copying we might start getting memory read errors,
though.

Marvin Humphrey

Reply via email to