On Aug 25, 2013, at 05:50 , Marvin Humphrey <[email protected]> wrote:
> On Fri, Aug 23, 2013 at 3:33 PM, Nick Wellnhofer <[email protected]> wrote: >> I just saw that you started to work on an immutable string class. It's a >> good idea to get this done before the first Clownfish release. Some >> implementation details have already been discussed on lucy-dev and I have an >> unpublished, local branch where I continued to flesh out the design of >> string iterators. > > Before getting started, I had a look at the published branch > "string-iterator-wip1": > > https://git-wip-us.apache.org/repos/asf?p=lucy.git;a=shortlog;h=refs/heads/string-iterator-wip1 > > It seems as though source code churn from changes like "_IMP" has made the > branch impractical to update, so the commits will have to be recreated one by > one. Nevertheless, the concepts remain just as applicable since the main > code base has changed only superficially. This was just an early experiment anyway. My current sketch for the StringIterator class looks like this: class Clownfish::StringIterator cnick StrIter inherits Clownfish::Obj { String *string; size_t byte_offset; inert incremented StringIterator* new(String *string, size_t byte_offset); /** Return the substring between the top and tail iterators. * @param offset Offset from the top, in code points. * @param len The desired length of the substring, in code points. */ inert incremented String* substring(StringIterator *top, StringIterator *tail); public incremented Obj* Clone(StringIterator *self); public void Assign(StringIterator *self, StringIterator *other); /** Return true if the iterator is not at the end of the string. */ public bool Has_Next(StringIterator *self); /** Return true if the iterator is not at the start of the string. */ public bool Has_Prev(StringIterator *self); /** Return the code point after the current position and advance the * iterator. Return CFISH_STRITER_DONE at the end of the string. */ public uint32_t Next(StringIterator *self); /** Return the code point before the current position and go one step back. * Return CFISH_STRITER_DONE at the start of the string. */ public uint32_t Prev(StringIterator *self); /** Skip code points. * @param num The number of code points to skip. * @return the number of code points actually skipped. This can be less * than the requested number if the end of the string is reached. */ public size_t Advance(StringIterator *self, size_t num); /** Skip code points backward. * @param num The number of code points to skip. * @return the number of code points actually skipped. This can be less * than the requested number if the start of the string is reached. */ public size_t Recede(StringIterator *self, size_t num); /** Skip whitespace. * @return the number of code points skipped. */ public size_t Skip_Next_Whitespace(StringIterator *self); /** Skip whitespace backward. * @return the number of code points skipped. */ public size_t Skip_Prev_Whitespace(StringIterator *self); /** Test whether the content after the iterator starts with the content * of a string. */ bool Starts_With(StringIterator *self, String *prefix); public void Destroy(StringIterator *self); } Some other things that might be useful: - Peek_Next/Peek_Prev (could replace Has_Next/Has_Prev) - Keeping track of the character (code point) offset. This is problematic with string iterators starting from the end of a string unless we add a field containing the total number of characters to the String class. > There's a task which isn't on your list which is to my mind perhaps the most > important: vet the emerging Clownfish String design against existing > implementations from several other popular programming languages. > > I'm concerned about several substandard features of Clownfish, which wormed > their way into the codebase through expedience, accident, or failed > experiment, becoming part of the public API. We're better off excising that > unhealthy tissue sooner rather than later, and comparing Clownfish's design > against other designs will hopefully allow us to diagnose any problems. I think one of the worst parts is the use of ViewCharBufs together with Nip/Chop for iteration. > It would be great to collaborate on creating the best possible immutable > String class for Clownfish. I suspected you might want to take part, which > was why I deliberately started off the cfish-string-wip1 branch with only the > most basic skeleton. :) Let's get this done, then ;) > When we're done (enough) with String, if it turns out that piecemeal > integration is too awkward, there's an alternate path: > > 1. Start a new branch. > 2. Duplicate CharBuf in a new class, CharBuffer. > 3. Rename CharBuf to String in one huge but superficial. > 4. Switch over sites which actually need mutability to CharBuffer. There > aren't that many. > 5. Replace CharBuf-masquerading-as-String with the actual immutable String > class completed earlier. > 6. Rename CharBuffer to CharBuf. Maybe that's the better approach. What do we gain from making CharBuf a subclass of String temporarily? Also, a major part of the work will be to replace ViewCharBufs with StringIterators. What about this: 1. Start a new branch. 2. Rename CharBuf to String, ZombieCharBuf to StackString. 3. Implement new CharBuf class. 4. Switch over sites which actually need mutability to CharBuf. 5. Remove mutating methods from String, but keep Nip/Chop for ViewCBs. 6. Implement StringIterator. 7. Switch from ViewCharBufs to StringIterator. 8. Remove Nip/Chop and ViewCharBuf. 9. Review and rewrite String class now that strings are immutable. Nick
