On Sat, May 4, 2013 at 4:43 AM, Nick Wellnhofer <[email protected]> wrote: > On May 4, 2013, at 01:14 , Marvin Humphrey <[email protected]> wrote: >> OK, cool... Since it's primarily a naming change but disruptive, I'd >> suggest the following order of actions: >> >> 1. Grep for `INCREF` and `Inc_RefCount` and make sure all invocations >> capture the returned reference. >> 2. Implement immutable String. >> 3. Make the naming change. > > As a side note, step 3 is quite a bit more than a naming change.
I was unclear. The "naming change" that I was referring to was INCREF/DECREF to CAPTURE/RELEASE. The mutable-CharBuf-to-immutable-String change is, as you point out, more involved than a renaming. > All usages of CB_Nip must be converted to string iterators since CB_Nip > mutates the string. Additionally, all the places where we construct new > strings have to be identified. In this case we have to keep using a CharBuf > followed by CB_Yield_String after the construction is complete. +1 We might want to introduce String as a subclass of CharBuf with all of the mutability disabled. That makes it easy to break up monolithic interconnected webs of CharBuf usage into bite-sized chunks when switching over. Once the transition is complete, we can remove the inheritance. >> Hmm. Well, the most incremental strategy is to hard-code UTF-8 into String >> for now and the Python bindings can just forego the stack-allocated-string >> optimization until we make up our minds later. > > It just occurred to me that there's another problem with the INCREF > approach. Since string iterators INCREF the source string, every zombie > string would be copied as soon as it's iterated. The String methods using > zombie iterators wouldn't be affected, but it would still result in many > unneccessary copies of host strings. That's true, but I still imagine stack wrappers are a win in the aggregate. (I wish I could quantify that assertion.) It's very common to pass simple strings: field names, hash keys, query strings, file names and file paths, user names, URIs, etc. > A possible solution would be to do away with stack allocation but to keep > using the host string buffer directly. Before returning to the host > language, we could check whether the refcount of the string is greater than > one and copy the string only then. This scheme wouldn't require changing > INCREF semantics. I think that exceptions may longjmp out of Clownfish code past the host wrapper, though. Right now if that happens we can leak memory -- a newly allocated Clownfish object needed for calling into Clownfish code may not get decremented. If we delay copying we might start getting memory read errors, though. Marvin Humphrey
