Re: Need for XMLSize_t?

Axel Weiï¿½ Sun, 12 Jun 2005 05:01:47 -0700

[EMAIL PROTECTED] wrote:
> > I would vote for not to limit XMLSize_t (or any other 'measuring'
> > type) to a fixed bit value. Instead, it should exactly shadow
> > size_t.
>
> Well, the we'ed better be prepared to change code all over the parser.
> XMLSize_t is used only in the new DOM.  Almost everywhere else,
> unsigned int is used.  Look at all of the string functions in
> XMLString:


Hi Dave,

I would not at all mind changing widespread code in order to correct 
design errors. We move towards a new release, so it's now to take the 
chance to do it!

> static unsigned int stringLen(const XMLCh* const src);

Clearly, this (and it's sisters) is poorly designed. 'stringLen' really 
should return size_t, or XMLSize_t, respectively (see below).

> static int indexOf(const char* const toSearch, const char ch);
>
> In fact, you can see some of these functions return int, which even
> further restricts the length of a string.
>
> the SAX interfaces:
>
> virtual void
> ContentHandler::characters
>     (
>         const   XMLCh* const    chars
>         , const unsigned int    length
>     ) = 0;
>
>
> Given that, I don't think it's very useful, and could even be
> dangerous, to have XMLSize_t be a synonym for size_t.

What is more dangerous: making a mistake apparent - or conceil it?

I see it my way[tm]: during the past decade, there was no strong reason 
to distinct between unsigned int and size_t (both were 32-bit, except 
for some smaller embedded or dsp platforms, which are of no interest 
here). Programmers (including myself!) tended to become insensitive for 
the different semantics of these unsigned types and mixed them up here 
and there.

Now, some 64-bit architectures arise and bring some new light into the 
scene. Suddenly, there's big confusion about 'unsigned int' or even 
'unsigned long' to be not sufficient any more to hold a 
pointer-difference. The only chance I can see to become compatible again 
with these systems is to revise every piece of code and use the correct 
types, according to their semantics.

> > And we do not know today how many nodes some xml document will carry
> > in 2015. But: xerces should be conform with the xml standards, and
> > AFAIK w3c has no 32-bit limit for the number of nodes in a document.
>
> Any implementation will have limitations.  Are you saying a DOM
> implementation on a 32-bit platform is non-conforming because it
> limits the number of nodes in a document?  In fact, the official DOM
> IDL bindings _do_ have explicit sizes for all types, so the
> limitations are explicit as well.

I'm feeling misunderstood at this point, Dave. Allow me to argue a bit 
more in detail.

I'd like to distinct three items here: 
- limitation by hardware (1)
- limitation by software (2)
- limitation by standards (3)
(1) is a limitation that may vary with time
(2) is the only limitation that is visible through the api
(3) is fixed, stable and 'provable'

Then, 'conforming' software should likely behave as follows:
- it should detect and respect the hardware limitations (1) in order not 
to fail on a particular platform
- it should identify absolute limitations defined by the standards (3) 
and adjust it's own limitations (2) to reflect them in the api
- for all others, it should scale it's own limitations (2) to reflect (1)

In terms of the above, a DOM implementation would be non-conforming, if 
it would set some limits by it's own (2), regardless the underlying 
harware limits (1), which are no explicit standard limits (3).

It would be conforming, if it's own limits (2) either reflect standard 
limits (3) or always scale with the underlying hardware limits (1).

What I want to find is:
- If we find in the xml specs an explicit, absolute limit (3) for the 
number of elements a DOMDocument may contain, we could set an absolute 
limit (2) to the datatype that counts DOMElements (e.g. 32-bit).
- If we can't find such a limit (I think, we definitely won't), we should 
carefully scale to the hardware limits (1) of a datatype that counts 
entities (maybe size_t here).

And, regarding to XMLSize_t:
- If we find in the xml specs an explicit, absolute limit (3) for the 
length an XMLString may have, we could set an absolute limit (2) to the 
datatype that measures stringLen (e.g. 32-bit).
- If we can't find such a limit, we should carefully scale to the 
hardware limits (1), and that's size_t.

In my reading, the xml standatds don't have a 32-bit limit (3) for string 
length. Then, using 'unsigned int' to represent string lengths, clearly 
is a design error and should be corrected, e.g. be replaced by 
size_t/XMLSize_t.

> > The problem of blowing the DOMNode representation on 64-bit systems
> > is not as grave as it seems on a first sight. Systems grow as they
> > grow - everybody knows that 64-bit software take more memory than
> > their 32-bit equivalents - and uncomplainingly slots in double RAM.
>
> I dispute the notion that everyone's customers "knows that 64-bit
> software take more memory than their 32-bit equivalents" and that they
> "uncomplainingly slots in double RAM."  Perhaps your customers do this
> uncomplainingly, but many of mine do not.

(I only made a side note here. Maybe your customers will learn this 
soon ;)

Cheers,
                        Axel

-- 
Humboldt-Universitï¿½t zu Berlin
Institut fï¿½r Informatik
Signalverarbeitung und Mustererkennung
Dipl.-Inf. Axel Weiï¿½
Rudower Chaussee 25
12489 Berlin-Adlershof
+49-30-2093-3050
** www.freesp.de **

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Need for XMLSize_t?

Reply via email to