------------------------------- UPCOMING MOZILLA STRING CHANGES -------------------------------
The Mozilla string code will be undergoing extensive revision following the release of Mozilla 1.7 alpha. The changes will be mostly transparent, having very little affect on the string API. This change will be made first thing during the 1.7 beta cycle.
The work is being tracked here: http://bugzilla.mozilla.org/show_bug.cgi?id=231995
The major API-level changes include:
(1) nsAC?String will no longer be able to represent multi-fragment
strings. This allows all implementations of nsAC?String to be
unified, resulting in a significant reduction of code. (2) nsReadingIterator and nsWritingIterator will be limited to
iterating over a contiguous buffer. Previously, operator++ was
forced to "normalize" the iterator forward to the next fragment.
This added additional code to every consumer of iterators that
was almost never needed since multi-fragment strings are very
uncommon. (3) nsA?String methods are now all non-virtual. This is possible
since there is now only one implementation of nsAC?String. This
helps reduce code at the call sites and improves performance.
ABI compatibility with the existing vtable is maintained (more
on this later). It is important to note that any external
components that use multi-fragment strings will be broken, but
passing multi-fragment strings in external components was
forbidden anyway (although the prohibition was poorly
documented) and none of our implementations of multi-fragment
strings were ever frozen for component developers to use outside
of the Mozilla codebase. (4) A simplified string API is introduced for embedders and external
component developers. nsAC?String's methods are now meant to be
used only within the Mozilla code base. The nsEmbedC?String
class is now implemented in terms of the simplified string API.(5) The following string classes have been eliminated:
nsSharableC?String
nsC?String will now allocate a sharable buffer by default. It
implements thread-safe reference counting, enabling copy-on-
write semantics for most strings. Since very little code
referenced nsSharableC?String, this class name has been
eliminated.nsDependentSingleFragmentC?Substring
This is now equivalent to nsDependentC?Substring. Since very
little code referenced nsDependentSingleFragmentC?Substring,
this class name has been eliminated.nsDependentC?Concatenation
Since nsA?CString can no longer represent a multi-fragment
string, nsDependentC?Concatenation could no longer inherit
from nsAC?String. Therefore, this class no longer exists.
However, efficient string concatenation is still implemented
using a very similar mechanism. More on this later.(6) nsStringFwd.h now forward declares all string classes.
(7) nsC?Substring has been added to the string hierarchy. It will
be the core string class from which all other strings inherit.
It behaves much like the old nsSingleFragmentA?CString, except
that it does not reference the nsAC?String vtable to satisfy any
of its methods. Many of the "getter"-functions are inlined for
performance.The revised string hierarchy is depicted below:
nsAC?String
|
|
|
nsC?Substring
|
|-------- nsDependentC?Substring
|
nsC?String
|
|------------.----------.----------.
| | | |
nsDepedentC?String | | |
| | |
nsC?AutoString | |
| |
nsXPIDLC?String |
|
nsPromiseFlatC?StringClass overview:
nsAC?String
This class is designed to be subclassed. It is never directly
instantiated. This class exists only to provide backwards
compatibility with the former string class API. It is essentially
equivalent to nsC?Substring. However, unlike nsC?Substring,
nsAC?String might be implemented by an external XPCOM component or
embedding application that has not yet migrated to the new
(simpler) embedding string API provided by XPCOM.nsC?Substring
This class is designed to be subclassed. It is never directly
instantiated. It represents a string fragment that may or may not
be null-terminated. It has methods to access and manipulate the
string buffer. It has all of the code to manage the various
different buffer allocation schemes used by the string classes.
In many ways, the subclasses of nsC?Substring simply provide
specialized constructors that select the corresponding memory
allocation scheme. If nsC?Substring needs to re-allocate the
buffer, it will allocate a null-terminated, sharable buffer.nsC?String
This class is designed to be instantiated directly. It is the
main string class. It provides a heap allocated string buffer.
It also provides compatibility methods with the "obsolete" string
API that used to live in xpcom/string/obsolete (i.e., the "Rick
G." string API). It always allocates a sharable buffer.nsDependentC?String
This class is designed to be instantiated directly. It provides a
mechanism to construct a nsC?String that simply stores a raw
pointer to an externally allocated buffer. This class depends on
the user of the class to ensure that the buffer remains valid for
the lifetime of the nsDependentC?String. This class can only wrap
a null-terminated buffer.nsAutoC?String
This class is designed to be instantiated directly. It provides a
mechanism to construct a nsC?String that optionally uses a fixed-
size, stack-based buffer. This class is designed to be allocated
on the stack. Allocating this class on the heap is usually a bad
idea ;-)nsXPIDLC?String
This class is designed to be instantiated directly. It provides
support for the getter_Copies mechanism. It also provides support
for a null buffer. Unlike nsC?String classes, the result of
nsXPIDLC?String::get() may return null if the nsXPIDLC?String is
uninitialized or was told to adopt a null-valued string buffer.
This class can also be cast automatically to |const char_type*|
for backwards compatibility. Use this class when working with
XPCOM getter methods that return |string| or |wstring|.nsPromiseFlatC?String
This class is designed to be instantiated via the
PromiseFlatC?String family of functions. PromiseFlatC?String
takes a nsAC?String and returns a nsPromiseFlatC?String, which
"promises" to be null-terminated. PromiseFlatC?String will
allocate a copy of the given string if necessary in order to fulfill
it's promise of a null-terminated string. The "flat" adjective
comes from the old string API that supported multi-fragment strings.
With these current string changes, PromiseFlatC?String is still very
useful for ensuring null-terminated storage. This is usually only
important when you need to pass a nsC?Substring to an API that takes
a raw character pointer.nsDependentC?Substring
This class is designed to be instantiated via the Substring family
of functions. It represents an array of characters that are not
null-terminated. Much like nsDependentC?String, this class
depends on an externally allocated string buffer. Use this class
to create a nsC?Substring that wraps a pair of raw character
pointers, a pair of nsReadingIterator<char_type>'s, or a section
of an existing nsC?Substring.Concatenations in the new world:
For the most part, string concatenation will continue to work just as they always have. They continue to be the preferred way to compose a new string from several other strings. The only difference in the new world is that the string concatenation class no longer inherits from nsAC?String, so it cannot be passed to functions expecting a nsAC?String. However, for compatibility with existing code, a concatenation of strings will automatically flatten itself into a nsC?String when necessary.
For example:
void foo( const nsAString& s )
{
nsCAutoString buf;
buf = NS_LITERAL_STRING("prefix") + e;
...
}In this case, the two strings "prefix" and |e| are written directly to the buffer owned by |buf|.
Here's another example:
void bar( const nsAString& s );
{
nsString a, b;
...
bar( a + b );
}In this case, a temporary nsString is created to hold the result of the concatenation of |a| and |b| prior to calling |bar|. This temporary nsString would not have been generated with the previous string implementation that supported multi-fragment nsAStrings. However, there was a serious bug in the older implementation that made doing this kind of thing crash-prone (especially if the definition of |bar| looked something like the definition of |foo| in the previous example). See bug 231995 for more details.
The main point here is that string concatenations will continue to work as they have in the past, with a few minor exceptions.
For example, code such as the following will no longer compile:
{
nsString a, b;
...
const nsAString& s = a + b;
...
}Such code is uncommon. It should be rewritten like this:
{
nsString a, b;
...
nsString r( a + b );
...
}|r| could also be declared a nsAutoString to avoid heap-allocating the result of the concatenation. However, since nsString allocates a sharable buffer, the programmer should consider nsString if it is expected that |r| might need to be copied elsewhere.
Maintaining string ABI compatibility:
nsA?CString exists for backwards compatibility with the frozen nsAC?String vtable. ABI compatibility is maintained even though nsAC?String's methods are all non-virtual. While this sounds like a contradiction, compatibility exists by having nsAC?String (in the new world) store a pointer to an implementation of the old vtable. The vtable methods all cast |this| to nsC?Substring and invoke the corresponding methods on nsC?Substring. (Yes, we are utilizing knowledge of how the compiler implements virtual functions, but that's not unfamiliar territory -- xpconnect!) This allows a new nsAC?String to have the same binary signature as an old nsAC?String. Likewise, every method on the new nsAC?String must first check the value of its vtable pointer to determine if |this| is really a nsC?Substring derived class or actually some other nsAC?String implementation (such as the old nsEmbedC?String).
An advantage of this approach is that it eliminates virtual function calls in most cases (especially for internal Gecko code). Common nsAC?String methods like BeginReading and Length are made much faster by avoiding virtual function calls. Code at the callsite is also reduced since there is no need to dereference the |this| pointer and the vtable pointer in order to gain access to the address of the virtual function. Now, the callsites make DSO/DLL calls which are significantly less costly in terms of codesize and runtime.
New string API for XPCOM component developers:
Going forward, external components and embedding applications should not call methods directly on nsA?CString. These classes should be viewed as opaque references to string objects. This is important because it will allow Gecko more flexibility to improve its string implementation in the future.
The new external string API consists of a small set of functions exported from the XPCOM library as well as a number of inline helper functions. Include nsStringAPI.h to use these functions.
nsEmbedString has been re-implemented in terms of this new external string API. For Gecko embedders and XPCOM component authors, the XPCOM glue provides stub implementations of the new external string API. All one needs to do to use these functions in external code is link to the XPCOM glue standalone library (xpcomglue_s).
If a component is developed against this new API, then it will only work in versions of Mozilla that support this new API (obviously). This means that component authors interested in compatibility with Mozilla 1.4 (for example) will need to develop their components against Mozilla 1.4 instead of the later versions of Mozilla. New versions of Mozilla will continue to be binary compatible with FROZEN interfaces defined by older versions of Mozilla (see "Maintaining string ABI compatibility" above).
So, are we supposed to stop using nsAString?
The answer is that it depends. AString in XPIDL will continue to map to nsAString. Gecko interfaces make extensive use of AString, ACString, and AUTF8String, and this isn't going to change. So, nsAC?String will continue to be very important to code that interacts with XPCOM interfaces. However, when it makes sense nsC?Substring should be used to pass around string references inside the Mozilla codebase. nsC?Substring unlike nsAC?String can be more efficient since it does not need to inspect and possibly jump through the vtable on each method call.
Moving code from nsA?CString to nsC?Substring is consistent with the overall strategy of deCOMification that is on-going within Gecko. If it ever happens that we are able to break binary compatibility with Mozilla 1.0, then we would want to equate nsAC?String to nsC?Substring. Of course, I'm not counting on this happening anytime soon.
Embedders and external component developers should treat nsAC?String as an opaque handle to a string object. They should use the new external string API and nsEmbedC?String to work with Mozilla strings. <= I'm repeating myself here ;-)
I've tried to minimize the impact of these changes. I don't expect Mozilla hackers to have to re-learn a new string API. If you are writing an external XPCOM component, I hope you will find the new API easier to work with. I should add that my goal is to freeze the new external string API for Mozilla 1.7 final.
Please let me know if you have any questions or concerns about these changes.
Darin Fisher ([EMAIL PROTECTED]) 2004-02-17 _______________________________________________ mozilla-embedding mailing list [EMAIL PROTECTED] http://mail.mozilla.org/listinfo/mozilla-embedding
