>> 2) COW must survive garbage collection > > COW can in certain cases, *not* survive garbage collection, specifically
The simplest possible implementation of COW strings would be to let the garbage collector 'undo' the COW nature i.e. make multiple copies of all shared buffers. All I meant was "don't do this" i.e. the GC must understand COWs and handle them appropriately. > > > offset. This impacts on all code using bufstart as the start of the string. > > Yeah, I but I believe this is fine. The impact on string size is just 4 The extra field is unavoidable, it just means that the patch updates files all over the place, changing 'bufstart' to (in my case) 'strstart'. If COW is going to be implemented, I think that the header change should be made sooner rather than later, as otherwise there will just be more code to change. However, I am not sure what impact the Unicode implementation will have on strings in general. > > > just used the byte after buflen. To make sure I always have space for the > > Ahh, nice. In my few minutes of thinking about it, I got stuck wondering > how to store information that belongs to the buffer data (the refcount, > for example). > > This kind of hidden data needs to be documented. Perhaps we should define > a buffer-header struct (oh, wait, that name's taken) that we can put at > the head or tail of a buffer for this kind of data. I suggested some time ago that we needed a string buffer header as part of the buffer, as distinct from the string header; this was turned down by Dan. My thoughts were that buffer-related information (possibly including things like charset and encoding) could be stored once with the buffer; with COW strings, that would buy us a little bit of memory, on the other hand it is more data to be copied around during garbage collection. So for my version of COW, I cheated and used a hidden trailer. I used the COW flag that already exists in the string header; moving the COW flag to the buffer itself may simplify the GC process, I need to think about that. > > Can you clarify how yours works a bit more, please? Does yours do anything > in do_dod_run, or is it all in do_collect ("GC pass" was unspecific, at > least to me). I don't see where the two two pointers come from, or why you > need three values in your flag byte, and thus I have no idea how yours > works. :) Let me first describe how my current implementation works, without the logic to un-flag ex-COWs: Parrot_allocate sets the hidden flag byte to zero (unseen) when a new buffer is born (this could be done in string_make, but this way nobody outside resources.c needs to know; however, this does mean that non-string buffers get the same treatment). All the other work is done in go_collect. During the copying process, if a header is flagged as COW, and the (old) buffer is flagged as unseen, the address of the new buffer is stored in the old buffer, and the old buffer is flagged as seen. Note that the string start address has to be recalculated to handle substrings (this could be avoided by using an offset, but this would then have to added to bufstart whenever it was used). The new buffer always has its flag byte set to zero, ready for next time around. If a header is COW and the old buffer has been seen already, then no copying is performed; the header's bufstart and strstart are simply updated to point to the new location of the buffer. Thus, we need one pointer in the buffer to hold the relocation address. My current idea for the ex-COW flagging, which I will try to implement today, is as follows. This is slightly revised following your comments, so the use of two pointers has been removed! The first time a buffer is seen, store the header address in the old buffer (I originally thought this would be a second pointer; but it can replace the relocation address since the header already knows that), set the 'seen' flag in the old buffer, and clear the header's COW flag. The second time a buffer is seen, use the pointer to the first header to get the new address of the buffer for updating the second header. Also, set the first header's COW flag (the second header will already be a COW). The three-valued flag is also not required; it does no harm to simply set the first header's COW flag every time. The additional loop you mention is avoided by setting the flag to zero when a buffer is born or copy-collected. Note that the flag we change is the one on the OLD buffer, which is being thrown away as soon as the collection is complete. > Well, did you make the necessary changes to the various strings? I changed string_copy and string_substr to make COWs string_concat was changed to work in-place if the first buffer is not COWed and is large enough; the change to string_make to round up buflen means this happens quite often, and is actually a worthwhile optimisation even without COWs. If either of the inputs to _concat is null, it already invokes string_copy and therefore COWs Strangely, string_chopn did not need to change, as it only changes the string header (bufused and strlen) and not the buffer itself (thanks to the earlier decision to remove null termination from parrot strings) I did not change string_repeat; this could be modified to do a simple string_copy if the repeat count was one, and to act in-place if possible Since then, string_replace has been added; for now, I will just change this to do a straight un-cow (i.e. make a copy of the buffer if it is marked as COW) > Mike Lambert > -- Peter Gibbs EmKel Systems