Michael Meeks wrote:
- rtl_uString_internConvert should signal an error if conversion fails
(RTL_TEXTTOUNICODE_FLAGS_xxx_ERROR).
I was following the pattern here from rtl_string2UString are you
certain you want (inconsistent?) error handling for these methods ? easy
enough to add if so.
The rtl_string2UString direction (OString -> OUString) is not that
problematic, as usually there should not be translation problems.
However, OUString -> OString can very well run into problems of missing
characters in the target encoding. That's why I had to introduce
error-signalling rtl_convertUStringToString.
sal/rtl/source/strimp.h:
- SAL_INTERLOCK_BITS (sizeof(oslInterlockedCount)*8) instead of
(sizeof(sal_Int32)*8), the type of refCount, got me confused.
We can have:
#define SAL_INTERLOCK_BITS (sizeof(((rtl_uString *)(NULL))->refCount) * 8)
If that is clearer for you. Incidentally - why is the refCount not of
type oslInterlockedCount - when we assume it is all over the place ?
[ would you accept a correction to the [u]string structures to use the
oslInterlockedCount type instead ?
That jumbo is probably more confusing than anything else. ;) I do not
know why we have the inconsistency in the first place (most probably
because somebody was not coding carefully), and I only wanted to express
that it confused me, but either oslInterlockedCount or sal_Int32 in
SAL_INTERLOCK_BITS is probably just fine in the end. [And yes, a
correction would not hurt, either.]
sal/rtl/source/ustring.c:1.24.38.3:
- sal_Bool might be better than int CAN[NOT]_RETURN.
as you like - since it deals with lifecycle, I find this clearer to
understand where the ownership is being transferred - but there may be
better names for the defines ?
Or an enum?
- Why SAL_STRING_INTERN_FLAG? Why not change
rtl_uString_intern_internal so that it works correctly on already
interned strings.
Apart from the freeing performance - it also makes 'interning' a very
cheap operation if it is already interned; since I want to do this for
eg. for every string in a calc cell, it's nice if we can avoid a hash
lookup wherever possible.
That argument alone would sound a bit like premature optimization. ;)
But, as you already pointed out, it is necessary anyway.
Also, what just appeared to me, INTERN_FLAG helps avoid problems: While
rtl::O[U]String are effectively non-mutable, tools' {Byte|Uni}String
have mutating functions like SetChar which do COW. Fortunately, that
COW is done based on refCount == 1, so that it cannot happen that an
interned (but only once referenced, refCount == INTERN_FLAG|1) string is
mutated and would thus end up in a potentially wrong hash bucket.
(What is tricky is that refCount from rtl_[u]String is called mnRefCount
in {Byte|Uni}StringData. Grep for "mnRefCount == 1" in tools
strimp.cxx/strascii.cxx. Why did nobody clean up and remove
{Byte|Uni}StringData long ago? Grr...)
I'm just resynching the CWS to m202, and I'm have a few touches around
the source tree that drastically reduce string duplication :-)
[ those changes are of course rather small, mostly a load of '.intern()'
calls ]; hopefully non-controversial (some accidentally reverse diffs
FWIW ;-) http://go-oo.org/~michael/size-intern.diff ( are you ok with
the 'Tools' intern call there too ? ).
(yes, sure)
Of course, I'd like to avoid adding vcl, psprint, framework, configmgr,
sc to the CWS until the very last minute (to avoid resynching pain), so
I'll wait until you're happy :-)
I'm happy, go ahead. :)
-Stephan
Thanks,
Michael.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]