Boris Kolpackov wrote:
Hi David,
David Bertoni <[EMAIL PROTECTED]> writes:
There are tons of places where int/long is used instead of XMLSize_t.
While most of them are straightforward to move to XMLSize_t, there are
places with really rotten code, for example, signed int is used to
return length and -1 is returned to indicate an error (so this code
is not even 32-bit safe). Such places will require more effort to
port.
Exactly. But changing these int return types to XMLSSize_t (signed size
type) should be safe, correct?
No, that's exactly the change I am afraid people would make (once it
is made it will be very hard to find and fix).
If you return something that can potentially require 64 bit (e.g.,
any kind of memory-related length or size), we should always use
XMLSize_t. If you use XMLSSize_t then you effectively halve the
amount of length/size that can be represented.
Yes, I agree.
Some people like to argue that this is not a problem since it is not
realistic to have, say, more than 2^32 attributes. To this I can only
say that less than 10 years ago people thought it is not realistic
to have more than 4GB of memory. I don't want to go through the same
process again in the next few years.
I agree as well.
I should have been more clear about what I was proposing. Changing the
code to an unsigned type is risky, because the APIs previously used a
negative value to indicate "not found" or "error." However, changing to
a 64-bit _signed_ type on 64-bit platforms, while not a permanent
solution, adds another 32 bits for returning index values, without all
of the problem with changing to an unsigned type.
As it stands now, on 64-bit platforms, the "indexing" address space in
the XMLString APIS is (usually) limited to INT_MAX. If we change it to
a signed 64-bit type, it will be (usually) LONG_MAX. If we leave things
as they are now, the incoming size parameters will be unsigned 64-bit
values, while the return types will be signed 32-bit values. That seems
inconsistent, and will make it very difficult to write 32-bit/64-bit
compatible code.
I will start on a patch to move the template collection classes from
unsigned int to XMLSize_t, which should be safe. There are also some
places where ints are used as keys for the hash tables. I believe those
should be safe to move to XMLSSize_t or XMLSize_t, as they are used in a
handful of places.
Sounds good but please don't just mindlessly change signed int to XMLSSize_t.
In most cases when signed int is used it is either that the original
author forgot to put unsigned (yes, there are places where both int and
unsigned int types are used for the same thing in the same interface)
or that there is some extra information represented by the negative
value (e.g., error). In the latter case the thing needs to be redesigned.
I try not to mindlessly change anything. Trust me, I'm very careful
about changing code without careful risk/benefit analysis.
In the case of the collections, the int values are just used as keys, so
they have no semantic meaning to the collections themselves. Also,
since these collections are more esoteric, they are not used in many places.
For example, RefHash3KeysIdPool is used by the scanners and schema
grammar to hash element declarations using the element name, namespace
URI and schema scope. The keys are a void* (the name), and unsigned int
for the URI ID, and an unsigned enum for the scope. So why should the
keys be ints when they're always storing unsigned ints?
I may also get to some of those containers (those that are in util/ and
framework/) before you since it is easier to just fix everything than
to figure out what's public and what's not. So I suggest that you give
me a couple more days to finish my interface porting effort.
Will do.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]