Re: COW strings

Peter Gibbs Tue, 02 Apr 2002 02:55:08 -0800

>> 2) COW must survive garbage collection
>
> COW can in certain cases, *not* survive garbage collection, specifically


The simplest possible implementation of COW strings would be to let the
garbage collector 'undo' the COW nature i.e. make multiple copies of all
shared buffers. All I meant was "don't do this" i.e. the GC must understand
COWs and handle them appropriately.

>
> > offset. This impacts on all code using bufstart as the start of the
string.
>
> Yeah, I but I believe this is fine. The impact on string size is just 4

The extra field is unavoidable, it just means that the patch updates files
all over the place, changing 'bufstart' to (in my case) 'strstart'. If COW
is going to be implemented, I think that the header change should be made
sooner rather than later, as otherwise there will just be more code to
change. However, I am not sure what impact the Unicode implementation will
have on strings in general.

>
> > just used the byte after buflen. To make sure I always have space for
the
>
> Ahh, nice. In my few minutes of thinking about it, I got stuck wondering
> how to store information that belongs to the buffer data (the refcount,
> for example).
>
> This kind of hidden data needs to be documented. Perhaps we should define
> a buffer-header struct (oh, wait, that name's taken) that we can put at
> the head or tail of a buffer for this kind of data.

I suggested some time ago that we needed a string buffer header as part of
the buffer, as distinct from the string header; this was turned down by Dan.
My thoughts were that buffer-related information (possibly including things
like charset and encoding) could be stored once with the buffer; with COW
strings, that would buy us a little bit of memory, on the other hand it is
more data to be copied around during garbage collection. So for my version
of COW, I cheated and used a hidden trailer. I used the COW flag that
already exists in the string header; moving the COW flag to the buffer
itself may simplify the GC process, I need to think about that.

>
> Can you clarify how yours works a bit more, please? Does yours do anything
> in do_dod_run, or is it all in do_collect ("GC pass" was unspecific, at
> least to me). I don't see where the two two pointers come from, or why you
> need three values in your flag byte, and thus I have no idea how yours
> works. :)

Let me first describe how my current implementation works, without the logic
to un-flag ex-COWs:

Parrot_allocate sets the hidden flag byte to zero (unseen) when a new buffer
is born (this could be done in string_make, but this way nobody outside
resources.c needs to know; however, this does mean that non-string buffers
get the same treatment).
All the other work is done in go_collect. During the copying process, if a
header is flagged as COW, and the (old) buffer is flagged as unseen, the
address of the new buffer is stored in the old buffer, and the old buffer is
flagged as seen. Note that the string start address has to be recalculated
to handle substrings (this could be avoided by using an offset, but this
would then have to added to bufstart whenever it was used). The new buffer
always has its flag byte set to zero, ready for next time around. If a
header is COW and the old buffer has been seen already, then no copying is
performed; the header's bufstart and strstart are simply updated to point to
the new location of the buffer.

Thus, we need one pointer in the buffer to hold the relocation address.

My current idea for the ex-COW flagging, which I will try to implement
today, is as follows. This is slightly revised following your comments, so
the use of two pointers has been removed!
The first time a buffer is seen, store the header address in the old buffer
(I originally thought this would be a second pointer; but it can replace the
relocation address since the header already knows that), set the 'seen' flag
in the old buffer, and clear the header's COW flag.
The second time a buffer is seen, use the pointer to the first header to get
the new address of the buffer for updating the second header. Also, set the
first header's COW flag (the second header will already be a COW).
The three-valued flag is also not required; it does no harm to simply set
the first header's COW flag every time.

The additional loop you mention is avoided by setting the flag to zero when
a buffer is born or copy-collected. Note that the flag we change is the one
on the OLD buffer, which is being thrown away as soon as the collection is
complete.

> Well, did you make the necessary changes to the various strings?

I changed string_copy and string_substr to make COWs
string_concat was changed to work in-place if the first buffer is not COWed
and is large enough; the change to string_make to round up buflen means this
happens quite often, and is actually a worthwhile optimisation even without
COWs. If either of the inputs to _concat is null, it already invokes
string_copy and therefore COWs
Strangely, string_chopn did not need to change, as it only changes the
string header (bufused and strlen) and not the buffer itself (thanks to the
earlier decision to remove null termination from parrot strings)
I did not change string_repeat; this could be modified to do a simple
string_copy if the repeat count was one, and to act in-place if possible
Since then, string_replace has been added; for now, I will just change this
to do a straight un-cow (i.e. make a copy of the buffer if it is marked as
COW)

> Mike Lambert
>

--
Peter Gibbs
EmKel Systems

Re: COW strings

Reply via email to