Hi, Nick, Thanks for thinking this through!
On Wed, Oct 7, 2015 at 4:10 AM, Nick Wellnhofer <[email protected]> wrote: > Lucifers, > > here are my condensed thoughts on cleaning up the Clownfish String API: > > String > > - Rename `new_stack_string` to `init_stack_string`. > - Make all `new_*`, `init_wrap_*` functions public except: > - `init_stack_string` (only called from macros) > - Remove `Cat_Utf8` and `Cat_Trusted_Utf8`? > Callers should probably use CharBufs instead. > Only keep `Cat`. +1 > - Remove underused `Swap_Chars`. > Maybe add a `Replace` method instead. +1 to remove Swap_Chars. If you get serious about adding Replace, let's contemplate the interface. > - Make stack strings public? I'd say wait for now. They're truly an expert API -- it's too easy to blow the C stack by using them in a loop, etc. > - Add macro to create stack string from null-terminated UTF-8 > without passing the string size: > String *str = SSTR_WRAP_C("A string."); > - Make `Code_Point_*` return special value instead of 0 > if outside the string. +1 > - Make `Find` return a size_t. > Requires special value for "not found". Hmm, that's a toughie. If the sentinel is SIZE_MAX, that might not fit in all host numeric types. > - Make all methods public except: > - `Is_Copy_On_IncRef` > - `Hash_Sum` In addition, I'd suggest: * Remove `Str_less_than`, replacing it in PolyLexicon with `Str_Compare_To(a, b) < 0`. * Remove `Str_compare` from the public API. * Remove `new_from_char`. * Remove `BaseX_To_I64`, optionally adding its functionality to StringHelper. * Rename the argument to `Ends_With` from `postfix` to `suffix`. Same with `Ends_With_Utf8`. * The return values for `Trim`, `Trim_Top`, and `Trim_Tail` should be `incremented`. > StringIterator > > - Rename StrIter_substring to Str_new_from_iter? Hmm. I see where you're going with that, but new_from_iter doesn't communicate that there must be a single backing String for the two StringIterator arguments. Maybe `StrIter_crop`? > - Make `Starts_With`, `Ends_With` skip over strings? > I think this better matches current usage patterns. Somehow I don't understand what you mean. Also, as with String, the argument to Ends_With should be `suffix` rather than `postfix`. > - Find better names for `Skip_{Next|Prev}_Whitespace`? `Advance_Whitespace` and `Recede_Whitespace`? > - Make all methods public. +1 ... While preparing this reply, I also created the patch below with some doc tweaks. Marvin Humphrey diff --git a/runtime/core/Clownfish/String.cfh b/runtime/core/Clownfish/String.cfh index e422cbf..b49a0df 100644 --- a/runtime/core/Clownfish/String.cfh +++ b/runtime/core/Clownfish/String.cfh @@ -34,52 +34,52 @@ public final class Clownfish::String nickname Str size_t size; String *origin; - /** Return a new String which holds a copy of the passed-in string. - * Check for UTF-8 validity. + /** Return a String which holds a copy of the supplied UTF-8 character + * data after checking for validity. */ inert incremented String* new_from_utf8(const char *utf8, size_t size); - /** Return a new String which holds a copy of the passed-in string. No - * validity checking is performed. + /** Return a String which holds a copy of the supplied UTF-8 character + * data, skipping validity checks. */ inert incremented String* new_from_trusted_utf8(const char *utf8, size_t size); - /** Initialize the String using the passed-in string. Do not check - * validity of supplied UTF-8. + /** Initialize a String which holds a copy of the supplied UTF-8 character + * data, skipping validity checks. */ inert String* init_from_trusted_utf8(String *self, const char *utf8, size_t size); - /** Return a pointer to a new String which assumes ownership of the - * passed-in string. Check validity of supplied UTF-8. + /** Return a String which assumes ownership of the supplied buffer + * containing UTF-8 character data after checking for validity. */ inert incremented String* new_steal_utf8(char *utf8, size_t size); - /** Return a pointer to a new String which assumes ownership of the - * passed-in string. Do not check validity of supplied UTF-8. + /** Return a String which assumes ownership of the supplied buffer + * containing UTF-8 character data, skipping validity checks. */ inert incremented String* new_steal_trusted_utf8(char *utf8, size_t size); - /** Initialize the String using the passed-in string. Do not check - * validity of supplied UTF-8. + /** Initialize a String which assumes ownership of the supplied buffer + * containing UTF-8 character data, skipping validity checks. */ public inert String* init_steal_trusted_utf8(String *self, char *utf8, size_t size); - /** Return a pointer to a new String which wraps an external buffer - * containing UTF-8. The buffer must stay unchanged for the lifetime - * of the String. Check validity of supplied UTF-8. + /** Return a String which wraps an external buffer containing UTF-8 + * character data after checking for validity. The buffer must stay + * unchanged for the lifetime of the String. */ inert incremented String* new_wrap_utf8(const char *utf8, size_t size); - /** Return a pointer to a new String which wraps an external buffer - * containing UTF-8. The buffer must stay unchanged for the lifetime - * of the String. Do not check validity of supplied UTF-8. + /** Return a String which wraps an external buffer containing UTF-8 + * character data, skipping validity checks. The buffer must stay + * unchanged for the lifetime of the String. */ inert incremented String* new_wrap_trusted_utf8(const char *utf8, size_t size); @@ -87,8 +87,8 @@ public final class Clownfish::String nickname Str inert incremented String* new_stack_string(void *allocation, const char *utf8, size_t size); - /** Initialize the String which wraps an external buffer containing - * UTF-8. Do not check validity of supplied UTF-8. + /** Initialize a String which wraps an external buffer containing UTF-8 + * character data after checking for validity. */ public inert String* init_wrap_trusted_utf8(String *self, const char *utf8, size_t size); @@ -98,8 +98,8 @@ public final class Clownfish::String nickname Str inert incremented String* new_from_char(int32_t code_point); - /** Return a pointer to a new String which contains formatted data - * expanded according to CB_VCatF. + /** Return a String with content expanded from a pattern and arguments + * conforming to the spec defined by CharBuf. * * Note: a user-supplied `pattern` string is a security hole * and must not be allowed. @@ -128,13 +128,14 @@ public final class Clownfish::String nickname Str incremented String* Cat(String *self, String *other); - /** Return the concatenation of the String and the passed-in raw UTF-8. + /** Return the concatenation of the String and the supplied UTF-8 + * character data after checking for validity. */ incremented String* Cat_Utf8(String *self, const char *ptr, size_t size); - /** Return the concatenation of the String and the passed-in raw UTF-8. - * Don't check for UTF-8 validity. + /** Return the concatenation of the String and the supplied UTF-8 + * character data, skipping validity checks. */ incremented String* Cat_Trusted_Utf8(String *self, const char *ptr, size_t size); @@ -157,22 +158,22 @@ public final class Clownfish::String nickname Str public double To_F64(String *self); - /** Test whether the String starts with the content of another. + /** Test whether the String starts with `prefix`. */ bool Starts_With(String *self, String *prefix); - /** Test whether the String starts with the passed-in string. + /** Test whether the String starts with `prefix`. */ bool Starts_With_Utf8(String *self, const char *prefix, size_t size); - /** Test whether the String ends with the content of another. + /** Test whether the String ends with `postfix`. */ bool Ends_With(String *self, String *postfix); - /** Test whether the String ends with the passed-in string. + /** Test whether the String ends with `postfix`. */ bool Ends_With_Utf8(String *self, const char *postfix, size_t size); @@ -186,17 +187,17 @@ public final class Clownfish::String nickname Str int64_t Find_Utf8(String *self, const char *ptr, size_t size); - /** Test whether the String matches the passed-in string. + /** Test whether the String matches the supplied UTF-8 character data. */ bool Equals_Utf8(String *self, const char *ptr, size_t size); - /** Return the number of Unicode code points in the object's string. + /** Return the number of Unicode code points the String contains. */ size_t Length(String *self); - /** Get the String's `size` attribute. + /** Return the number of bytes occupied by the String's internal content. */ size_t Get_Size(String *self); @@ -251,35 +252,33 @@ public final class Clownfish::String nickname Str String* Trim_Tail(String *self); - /** Return the Unicode code point at the specified number of code points - * in. Return 0 if the string length is exceeded. (XXX It would be + /** Return the Unicode code point located `tick` code points in from the + * top. Return 0 if out of bounds. (XXX It would be * better to throw an exception, but that's not practical with UTF-8 and * no cached length.) */ int32_t Code_Point_At(String *self, size_t tick); - /** Return the Unicode code point at the specified number of code points - * counted backwards from the end of the string. Return 0 if outside the - * string. + /** Return the Unicode code point located `tick` code points counting + * backwards from the end. Return 0 if out of bounds. */ int32_t Code_Point_From(String *self, size_t tick); - /** Return a newly allocated String containing a copy of the indicated - * substring. + /** Return a new String containing a copy of the specified substring. * @param offset Offset from the top, in code points. * @param len The desired length of the substring, in code points. */ incremented String* SubString(String *self, size_t offset, size_t len); - /** Return an iterator to the start of the string. + /** Return an iterator initialized to the start of the string. */ incremented StringIterator* Top(String *self); - /** Return an iterator to the end of the string. + /** Return an iterator initialized to the end of the string. */ incremented StringIterator* Tail(String *self); @@ -363,14 +362,12 @@ public final class Clownfish::StringIterator nickname StrIter public size_t Skip_Prev_Whitespace(StringIterator *self); - /** Test whether the content after the iterator starts with - * `prefix`. + /** Test whether the content after the iterator starts with `prefix`. */ bool Starts_With(StringIterator *self, String *prefix); - /** Test whether the content after the iterator starts with the passed-in - * string. + /** Test whether the content after the iterator starts with `prefix`. */ bool Starts_With_Utf8(StringIterator *self, const char *prefix, size_t size); @@ -381,8 +378,7 @@ public final class Clownfish::StringIterator nickname StrIter bool Ends_With(StringIterator *self, String *postfix); - /** Test whether the content before the iterator ends with the passed-in - * string. + /** Test whether the content before the iterator ends with `postfix`. */ bool Ends_With_Utf8(StringIterator *self, const char *postfix, size_t size);
