On Apr 13, 2013, at 04:01 , Marvin Humphrey <[email protected]> wrote:
> * The next patch, "Convert CB_Trim_{Top|Tail} to use iterators", looks
> perfect as well. No need to implement in concrete subclasses, I see. :)
Yes, that's the idea. If the switch to iterators is complete, adding a new
string encoding should be really easy:
* Implement the iterator class for the encoding.
* Derive a new string class from CharBuf which only has to implement
the iterator factory methods and Cat_Char.
> * Should CharBufIterator#Next return 0 when it reaches the end of a string
> instead of throwing an exception? That would make it possible to iterate
> through text values which are not expected to contain NUL without having
> to call Has_Next() before each call to Next().
This seems like a worthwhile optimization. Here's how both versions would look
like:
CharBufIterator *iter = CB_Top(string);
while (CBIter_Has_Next(iter)) {
uint32_t code_point = CBIter_Next(iter);
...
}
CharBufIterator *iter = CB_Top(string);
uint32_t code_point;
while (0 != (code_point = CBIter_Next(iter))) {
...
}
I find the Has_Next() version more readable but the second variant isn't too
bad. The only thing I don't like is the choice of 0 as sentinel value. I'd
prefer -1 or any positive value beyond the Unicode planes.
After reaching the end of the string, I can see two options which both have
their pros and cons:
* Change to a new iterator state ("beyond string boundary") and throw
an exception on every subsequent access (like in your example below).
* Keep returning the sentinel value on calls to Next() but allow to
move backward via Prev(). This can result in an infinite loop in
faulty code.
> * I can imagine some possibilities for optimizing the internals, but those
> are implementation details (which I imagine you left out on purpose).
Yes, the Prev() and Next() methods should decode characters directly without
calling StrHelp_decode_utf8_char.
> It seems to me that the factory methods Top() and Tail() should be public,
> while ZTop() and ZTail() should have parcel exposure. More generally, I've
> moved towards the position that while we want to have the core allocate
> objects on the stack for certain internal tasks, it would be unwise at this
> point for Clownfish to support stack object allocation as a public API.
> I'm actually hoping that as we overhaul Clownfish string handling that
> ZombieCharBuf can go away. But what you've presented makes sense and fits
> perfectly within the current context.
I think the string iterators are good candidates for "zombie" objects. They're
small and typically only used within a single function. But +1 for not exposing
ZombieIterators publicly.
> How about something like this, which I think should have the same cost?
>
> - if (self->byte_offset >= char_buf->size) {
> - THROW(ERR, "Iteration past end of string");
> - }
> + if (self->byte_offset > char_buf->size) {
> + THROW(ERR, "Iteration past end of string");
> + }
> + else if (self->byte_offset == char_buf->size) {
Something like ++self->byte_offset would be needed here.
> + return 0;
> + }
See discussion above.
>> +ZombieUTF8Iterator*
>> +ZUTF8Iter_new(void *allocation, CharBuf *char_buf, size_t byte_offset) {
>> + ZombieUTF8Iterator *self = (ZombieUTF8Iterator*)allocation;
>> + self->ref.count = 1;
>
> Might want to go through VTable_Init_Obj(), no? ;)
Sure ;)
Nick