Hi Archie,

[EMAIL PROTECTED] wrote on 11/20/2006 01:55:01 PM:

> > >> Archie Cobbs wrote:
> > >>> Simple question for the Batik DOM implementors...
> > >>> Does Batik intern() all DOM Strings?

    No, Batik does not intern all DOM Strings.

>>>>> If not, I have a hunch that this could save a lot of memory.
>>>> Aren't the Strings defined as constants (static final) in Batik? As 
far
>>>> as I know, "intern"ing constants does not bring any advantage.

    It is possible that even some of these avoid being singleton Strings,
as they are constructed out of the SAX parser data.

> > > Is he talking about the strings that DOM has a data? Those aren't 
going
> > > to be static final.
> > Yes, of course. If so, sorry - I was thinking of the names of elements
> > and attributes.
> 
> Right..
> 
> E.g., if a document has a 1000 nodes with an attribute like
> 
>   requiredExtensions="http://example.org/SVGExtensionXYZ/1.0";

   Eh, well this would be unusual.  It is important to remember
that there is a price to be paid for interning strings.

> In addition, this might help speed things up in some cases. If all
> string values are interned (or, at least all string values that range
> over some fixed set of possible values), then we could internally use
> == instead of equals() for comparison.

   This is basically a false savings as the intern processes requires
doing a hashtable lookup (on a typically large hashtable).  That hashtable
lookup requires at least one string.equals check.  Since very often
the string going into the checks is from the user (setAttributeNS, etc)
it has to reinterned every time...

> It may also make sense to intern element names, so that there is only
> one Object per element type, instead of one object per element (e.g.,
> one "g" string instead of 1000).

   Yes, this is a definite win.

> Hopefully such a change could be implemented privately, in a localized
> part of the code. Then it would be easy (and interesting) to do some
> time and space performance comparisions, using various documents as
> input. I haven't looked at this part of the code myself though.

   This would be interesting to do, however I think it's unlikely to
be east to localize really well.  There are many potential sources 
and destinations for Strings in DOM.

   It might not be too hard to implement it for some of the
core parts (element names, attribute names, values, namespaces etc).
But it would be hard to cover everything.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to