Hi David,
David Bertoni <[EMAIL PROTECTED]> writes:
> I should have been more clear about what I was proposing. Changing the
> code to an unsigned type is risky, because the APIs previously used a
> negative value to indicate "not found" or "error." However, changing to
> a 64-bit _signed_ type on 64-bit platforms, while not a permanent
> solution, adds another 32 bits for returning index values, without all
> of the problem with changing to an unsigned type.
I prefer to fix the problem properly rather than implementing temporary
solutions. Once you change int to XMLSSize_t that piece of code will
be indistinguishable from other legitimate uses of XMLSSize_t.
The code that returns length/size via a signed int should be redesign
to use XMLSize_t and use some other mechanism to signal an error (e.g.,
the function returns bool and has a by-reference argument of type
XMLSize_t).
> As it stands now, on 64-bit platforms, the "indexing" address space in
> the XMLString APIS is (usually) limited to INT_MAX. If we change it to
> a signed 64-bit type, it will be (usually) LONG_MAX. If we leave things
> as they are now, the incoming size parameters will be unsigned 64-bit
> values, while the return types will be signed 32-bit values. That seems
> inconsistent, and will make it very difficult to write 32-bit/64-bit
> compatible code.
There are three places where int is used in XMLString:
(1) regionMatches/regionIMatches
These are some bizarre, special-purpose functions that can compare
string regions using negative offsets. They are only used in regex
and should be moved there instead of being in XMLString. In this
case changing int to XMLSSize_t might be the only reasonable
solution unless we can be certain that they are never called
with negative offsets by the regex code.
(2) hash functions
These currently return unsigned int and should be changed to
XMLSize_t.
(3) indexOf functions
Currently these return position in the string and -1 if not found.
The way to address this is to return XMLSize_t and use ~((XMLSize_t)0)
to indicate the "not found" condition. This is what std::string does.
We should also define the npos constant in XMLString.
> > Sounds good but please don't just mindlessly change signed int to
> > XMLSSize_t.
>
> I try not to mindlessly change anything. Trust me, I'm very careful
> about changing code without careful risk/benefit analysis.
Thank you. This comment was addressed to everybody who might want to
try and port some portion of Xerces-C++ to use XMLSize_t. I think we
should humble ourselves and acknowledge that the Xerces-C++ code quality
is quite poor. Some of the things you see are really troubling or just
plain stupid. For example, today I fixed the following fragment that
is quite close to what ends up on The Daily WTF:
int included = 0;
// Code that never touches included...
return (included) ? true : false;
So while we all try to be careful and mindful stuff like this somehow
ends up in the code.
> In the case of the collections, the int values are just used as keys, so
> they have no semantic meaning to the collections themselves. Also,
> since these collections are more esoteric, they are not used in many places.
>
> For example, RefHash3KeysIdPool is used by the scanners and schema
> grammar to hash element declarations using the element name, namespace
> URI and schema scope. The keys are a void* (the name), and unsigned int
> for the URI ID, and an unsigned enum for the scope. So why should the
> keys be ints when they're always storing unsigned ints?
I have nothing against changing this. Also note that changing the
container classes is just the small part of the job. Once you do
this you will need to change every place where these collections
are used (otherwise you will get tons of warning on, for example,
VC++ in 64-bit mode).
For example, id type in StringPool should be changed to XMLSize_t.
But the unsigned int is used throughout the code for string ids and
these places will need to be changed as well. One more or less
manageable way to do this might be to change StringPool and then
try to compile Xerces-C++ with VC++ in 64-bit mode. This will
result in the compiler pin-pointing with warnings all the places
that are affected by the change.
Boris
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]