Re: [lucy-dev] Clownfish String API

Marvin Humphrey Thu, 15 Oct 2015 12:22:19 -0700

Hi, Nick,

Thanks for thinking this through!


On Wed, Oct 7, 2015 at 4:10 AM, Nick Wellnhofer <[email protected]> wrote:
> Lucifers,
>
> here are my condensed thoughts on cleaning up the Clownfish String API:
>
> String
>
> - Rename `new_stack_string` to `init_stack_string`.
> - Make all `new_*`, `init_wrap_*` functions public except:
>   - `init_stack_string` (only called from macros)
> - Remove `Cat_Utf8` and `Cat_Trusted_Utf8`?
>   Callers should probably use CharBufs instead.
>   Only keep `Cat`.

+1

> - Remove underused `Swap_Chars`.
>   Maybe add a `Replace` method instead.

+1 to remove Swap_Chars.  If you get serious about adding Replace, let's
contemplate the interface.

> - Make stack strings public?

I'd say wait for now.  They're truly an expert API -- it's too easy to blow
the C stack by using them in a loop, etc.

> - Add macro to create stack string from null-terminated UTF-8
>   without passing the string size:
>       String *str = SSTR_WRAP_C("A string.");
> - Make `Code_Point_*` return special value instead of 0
>   if outside the string.

+1

> - Make `Find` return a size_t.
>   Requires special value for "not found".

Hmm, that's a toughie.  If the sentinel is SIZE_MAX, that might not fit in all
host numeric types.

> - Make all methods public except:
>   - `Is_Copy_On_IncRef`
>   - `Hash_Sum`

In addition, I'd suggest:

*   Remove `Str_less_than`, replacing it in PolyLexicon with 
    `Str_Compare_To(a, b) < 0`.
*   Remove `Str_compare` from the public API.
*   Remove `new_from_char`.
*   Remove `BaseX_To_I64`, optionally adding its functionality to
    StringHelper.
*   Rename the argument to `Ends_With` from `postfix` to `suffix`.  Same with
    `Ends_With_Utf8`.
*   The return values for `Trim`, `Trim_Top`, and `Trim_Tail` should be
    `incremented`.

> StringIterator
>
> - Rename StrIter_substring to Str_new_from_iter?

Hmm.  I see where you're going with that, but new_from_iter doesn't
communicate that there must be a single backing String for the two
StringIterator arguments.  Maybe `StrIter_crop`?

> - Make `Starts_With`, `Ends_With` skip over strings?
>   I think this better matches current usage patterns.

Somehow I don't understand what you mean.

Also, as with String, the argument to Ends_With should be `suffix` rather than
`postfix`.

> - Find better names for `Skip_{Next|Prev}_Whitespace`?

`Advance_Whitespace` and `Recede_Whitespace`?

> - Make all methods public.

+1

...

While preparing this reply, I also created the patch below with some doc
tweaks.

Marvin Humphrey

diff --git a/runtime/core/Clownfish/String.cfh 
b/runtime/core/Clownfish/String.cfh
index e422cbf..b49a0df 100644
--- a/runtime/core/Clownfish/String.cfh
+++ b/runtime/core/Clownfish/String.cfh
@@ -34,52 +34,52 @@ public final class Clownfish::String nickname Str
     size_t      size;
     String     *origin;
 
-    /** Return a new String which holds a copy of the passed-in string.
-     * Check for UTF-8 validity.
+    /** Return a String which holds a copy of the supplied UTF-8 character
+     * data after checking for validity.
      */
     inert incremented String*
     new_from_utf8(const char *utf8, size_t size);
 
-    /** Return a new String which holds a copy of the passed-in string.  No
-     * validity checking is performed.
+    /** Return a String which holds a copy of the supplied UTF-8 character
+     * data, skipping validity checks.
      */
     inert incremented String*
     new_from_trusted_utf8(const char *utf8, size_t size);
 
-    /** Initialize the String using the passed-in string.  Do not check
-     * validity of supplied UTF-8.
+    /** Initialize a String which holds a copy of the supplied UTF-8 character
+     * data, skipping validity checks.
      */
     inert String*
     init_from_trusted_utf8(String *self, const char *utf8, size_t size);
 
-    /** Return a pointer to a new String which assumes ownership of the
-     * passed-in string.  Check validity of supplied UTF-8.
+    /** Return a String which assumes ownership of the supplied buffer
+     * containing UTF-8 character data after checking for validity.
      */
     inert incremented String*
     new_steal_utf8(char *utf8, size_t size);
 
-    /** Return a pointer to a new String which assumes ownership of the
-     * passed-in string.  Do not check validity of supplied UTF-8.
+    /** Return a String which assumes ownership of the supplied buffer
+     * containing UTF-8 character data, skipping validity checks.
      */
     inert incremented String*
     new_steal_trusted_utf8(char *utf8, size_t size);
 
-    /** Initialize the String using the passed-in string.  Do not check
-     * validity of supplied UTF-8.
+    /** Initialize a String which assumes ownership of the supplied buffer
+     * containing UTF-8 character data, skipping validity checks.
      */
     public inert String*
     init_steal_trusted_utf8(String *self, char *utf8, size_t size);
 
-    /** Return a pointer to a new String which wraps an external buffer
-     * containing UTF-8.  The buffer must stay unchanged for the lifetime
-     * of the String.  Check validity of supplied UTF-8.
+    /** Return a String which wraps an external buffer containing UTF-8
+     * character data after checking for validity.  The buffer must stay
+     * unchanged for the lifetime of the String.
      */
     inert incremented String*
     new_wrap_utf8(const char *utf8, size_t size);
 
-    /** Return a pointer to a new String which wraps an external buffer
-     * containing UTF-8.  The buffer must stay unchanged for the lifetime
-     * of the String.  Do not check validity of supplied UTF-8.
+    /** Return a String which wraps an external buffer containing UTF-8
+     * character data, skipping validity checks.  The buffer must stay
+     * unchanged for the lifetime of the String.
      */
     inert incremented String*
     new_wrap_trusted_utf8(const char *utf8, size_t size);
@@ -87,8 +87,8 @@ public final class Clownfish::String nickname Str
     inert incremented String*
     new_stack_string(void *allocation, const char *utf8, size_t size);
 
-    /** Initialize the String which wraps an external buffer containing
-     * UTF-8.  Do not check validity of supplied UTF-8.
+    /** Initialize a String which wraps an external buffer containing UTF-8
+     * character data after checking for validity.
      */
     public inert String*
     init_wrap_trusted_utf8(String *self, const char *utf8, size_t size);
@@ -98,8 +98,8 @@ public final class Clownfish::String nickname Str
     inert incremented String*
     new_from_char(int32_t code_point);
 
-    /** Return a pointer to a new String which contains formatted data
-     * expanded according to CB_VCatF.
+    /** Return a String with content expanded from a pattern and arguments
+     * conforming to the spec defined by CharBuf.
      *
      * Note: a user-supplied `pattern` string is a security hole
      * and must not be allowed.
@@ -128,13 +128,14 @@ public final class Clownfish::String nickname Str
     incremented String*
     Cat(String *self, String *other);
 
-    /** Return the concatenation of the String and the passed-in raw UTF-8.
+    /** Return the concatenation of the String and the supplied UTF-8
+     * character data after checking for validity.
      */
     incremented String*
     Cat_Utf8(String *self, const char *ptr, size_t size);
 
-    /** Return the concatenation of the String and the passed-in raw UTF-8.
-     * Don't check for UTF-8 validity.
+    /** Return the concatenation of the String and the supplied UTF-8
+     * character data, skipping validity checks.
      */
     incremented String*
     Cat_Trusted_Utf8(String *self, const char *ptr, size_t size);
@@ -157,22 +158,22 @@ public final class Clownfish::String nickname Str
     public double
     To_F64(String *self);
 
-    /** Test whether the String starts with the content of another.
+    /** Test whether the String starts with `prefix`.
      */
     bool
     Starts_With(String *self, String *prefix);
 
-    /** Test whether the String starts with the passed-in string.
+    /** Test whether the String starts with `prefix`.
      */
     bool
     Starts_With_Utf8(String *self, const char *prefix, size_t size);
 
-    /** Test whether the String ends with the content of another.
+    /** Test whether the String ends with `postfix`.
      */
     bool
     Ends_With(String *self, String *postfix);
 
-    /** Test whether the String ends with the passed-in string.
+    /** Test whether the String ends with `postfix`.
      */
     bool
     Ends_With_Utf8(String *self, const char *postfix, size_t size);
@@ -186,17 +187,17 @@ public final class Clownfish::String nickname Str
     int64_t
     Find_Utf8(String *self, const char *ptr, size_t size);
 
-    /** Test whether the String matches the passed-in string.
+    /** Test whether the String matches the supplied UTF-8 character data.
      */
     bool
     Equals_Utf8(String *self, const char *ptr, size_t size);
 
-    /** Return the number of Unicode code points in the object's string.
+    /** Return the number of Unicode code points the String contains.
      */
     size_t
     Length(String *self);
 
-    /** Get the String's `size` attribute.
+    /** Return the number of bytes occupied by the String's internal content.
      */
     size_t
     Get_Size(String *self);
@@ -251,35 +252,33 @@ public final class Clownfish::String nickname Str
     String*
     Trim_Tail(String *self);
 
-    /** Return the Unicode code point at the specified number of code points
-     * in.  Return 0 if the string length is exceeded.  (XXX It would be
+    /** Return the Unicode code point located `tick` code points in from the
+     * top.  Return 0 if out of bounds.  (XXX It would be
      * better to throw an exception, but that's not practical with UTF-8 and
      * no cached length.)
      */
     int32_t
     Code_Point_At(String *self, size_t tick);
 
-    /** Return the Unicode code point at the specified number of code points
-     * counted backwards from the end of the string.  Return 0 if outside the
-     * string.
+    /** Return the Unicode code point located `tick` code points counting
+     * backwards from the end.  Return 0 if out of bounds.
      */
     int32_t
     Code_Point_From(String *self, size_t tick);
 
-    /** Return a newly allocated String containing a copy of the indicated
-     * substring.
+    /** Return a new String containing a copy of the specified substring.
      * @param offset Offset from the top, in code points.
      * @param len The desired length of the substring, in code points.
      */
     incremented String*
     SubString(String *self, size_t offset, size_t len);
 
-    /** Return an iterator to the start of the string.
+    /** Return an iterator initialized to the start of the string.
      */
     incremented StringIterator*
     Top(String *self);
 
-    /** Return an iterator to the end of the string.
+    /** Return an iterator initialized to the end of the string.
      */
     incremented StringIterator*
     Tail(String *self);
@@ -363,14 +362,12 @@ public final class Clownfish::StringIterator nickname 
StrIter
     public size_t
     Skip_Prev_Whitespace(StringIterator *self);
 
-    /** Test whether the content after the iterator starts with
-     * `prefix`.
+    /** Test whether the content after the iterator starts with `prefix`.
      */
     bool
     Starts_With(StringIterator *self, String *prefix);
 
-    /** Test whether the content after the iterator starts with the passed-in
-     * string.
+    /** Test whether the content after the iterator starts with `prefix`.
      */
     bool
     Starts_With_Utf8(StringIterator *self, const char *prefix, size_t size);
@@ -381,8 +378,7 @@ public final class Clownfish::StringIterator nickname 
StrIter
     bool
     Ends_With(StringIterator *self, String *postfix);
 
-    /** Test whether the content before the iterator ends with the passed-in
-     * string.
+    /** Test whether the content before the iterator ends with `postfix`.
      */
     bool
     Ends_With_Utf8(StringIterator *self, const char *postfix, size_t size);

Re: [lucy-dev] Clownfish String API

Reply via email to