Ok, that's a different concern and it's valid. I don't buy the performance argument, though, especially since the intern() method will also cost something. Does anyone have any numbers?
On 26.06.2005 15:33:08 Andreas L. Delmelle wrote: > > -----Original Message----- > > From: Glen Mazza [mailto:[EMAIL PROTECTED] > > > > Hi Glen, > > <snip /> > > > > Another option: validateChildNode() is called from only one place, > > FOTreeBuilder.startElement(). At that point, we can also feed vcN() the > > parameter "namespaceURI.intern()" instead of just "namespaceURI". This > > could be slightly faster for some VCN()'s that compare against multiple > > URI's--but I would think .intern() is much slower than .equals() for the > > reason given above. > > The other idea to consider is the impact on memory. The string values of > interned strings are only stored once. There is indeed the overhead of a > call to .intern(), but the workings of that method will be nearly as > optimized as the .equals() method. Look up the string value in a hashtable: > if it doesn't exist, create a new one and return an internalized reference > to the value that's already stored in that hashtable. If it does, just > return the reference. > The source string value is immediately discarded in any case, only the > reference is kept. > > The benefit of interning can be most appreciated in cases where the strings' > lengths are long enough --exceed the size of a reference-- AND the number of > them being created is large enough. Both are the case for a lot of namespace > URIs, and node or attribute names. > This is precisely the reason why the SAX parser feature for string > interning --http://xml.org/sax/features/string-interning -- defaults to > 'true' in Xerces-J (and can't be set to 'false'). > > To put it quite bluntly: my concern is that we would essentially be adapting > our code to make it possible for people to use it to waste resources. Feed > it interned strings and it will work. Why would one really want to create a > separate string object for all occurences of a given namespace URI in a > random document, and at the very same time expect us to take into account > that they didn't intern those strings themselves... > > I still think Nils would gain more by manipulating his setup so that these > types of strings are already interned, more than we would gain by changing > our code to allow for them to be 'just' strings. > > > Cheers, > > Andreas Jeremias Maerki