On 22/04/2013 06:06, Marvin Humphrey wrote:
The optimal answers to these questions may change after we introduce of an
immutable String class. String is the use case that needs all the convenience
constructors. CharBuf probably needs only one constructor, which takes a size
for its buffer.
public inert incremented CharBuf*
new(size_t size);
Maybe we should start to flesh out the design of immutable Strings. Do
you have a concrete plan already? How should CharBufs and Strings interact?
Another thing to consider is which encodings we might support via raw content
constructors. I suspect we'll never go beyond common Unicode: UTF-8,
UTF-16LE, UTF-16BE, and possibly UTF-32LE and UTF-32BE.
Isn't UTF-32 used in Python (among other encodings)?
Anything else belongs in a library a la Perl's Encode.
+1
It would be nice if we could eliminate the "steal" variants -- they tend to
constrain how we implement String internally.
I found only three users of the "steal" constructors:
* S_unescape_text in Lucy::Util::Json could be changed to use
a CharBuf and Cat_Char
* SkipStepper_to_string could simply use CB_newf, no?
* DefDocReader_fetch_doc in the C bindings could create an
extra copy or we could add something like InStream#ReadString.
Since Mimic_Str() changes content, it's only relevant for CharBuf, not String.
Lucy currently uses Mimic_Str() in two places: FSDirHandle and PostingPool.
If FSDirHandle continues to use CharBuf instead of switching to String it
could use CB_setf() instead. (FSDirHandle's usage is bogus anyway because
it's assuming UTF-8 path names -- but it will at least throw an error rather
than segfault.) CB_setf() won't work for PostingPool, but there are still
plenty of alternatives to Mimic_Str(). With a little work, I think we can
eliminate with a Mimic_Str().
+1
* Cat_Str
* Cat_Trusted_Str
* Starts_With_Str
* Ends_With_Str
* Find_Str
* Equals_Str
These should probably expect the char* to always be in UTF-8.
Yes. And I think we should rename them *_UTF8 instead of *_Str to reflect
that fact. That will both clear up what encoding they expect and eliminate
potential confusion with String.
+1
Nick