Re: [lucy-dev] Proposal for implementation of immutable strings

Nick Wellnhofer Sat, 04 May 2013 04:44:24 -0700

On May 4, 2013, at 01:14 , Marvin Humphrey <[email protected]> wrote:
> OK, cool... Since it's primarily a naming change but disruptive, I'd suggest
> the following order of actions:
> 
> 1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations capture
>    the returned reference.
> 2.  Implement immutable String.
> 3.  Make the naming change.


As a side note, step 3 is quite a bit more than a naming change. All usages of 
CB_Nip must be converted to string iterators since CB_Nip mutates the string. 
Additionally, all the places where we construct new strings have to be 
identified. In this case we have to keep using a CharBuf followed by 
CB_Yield_String after the construction is complete.

> Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
> for now and the Python bindings can just forego the stack-allocated-string
> optimization until we make up our minds later.

It just occurred to me that there's another problem with the INCREF approach. 
Since string iterators INCREF the source string, every zombie string would be 
copied as soon as  it's iterated. The String methods using zombie iterators 
wouldn't be affected, but it would still result in many unneccessary copies of 
host strings.

A possible solution would be to do away with stack allocation but to keep using 
the host string buffer directly. Before returning to the host language, we 
could check whether the refcount of the string is greater than one and copy the 
string only then. This scheme wouldn't require changing INCREF semantics.

Nick

Re: [lucy-dev] Proposal for implementation of immutable strings

Reply via email to