Thanks Michael! And, welcome to core-libs-dev! :-)

On 6/26/2014 4:02 AM, Michael Kay wrote:
Here are some quick thoughts about the state of XML support in the JDK:

1. XML Parser. The version of Xerces in the JDK has long been buggy, and no-one 
has been fixing the bugs. It needs to be replaced with a more recent version of 
Apache Xerces if that hasn't already been done.

Yes, as Alan pointed out, we do have a project going on at the moment. The goal is to upgrade to the current version of Xerces, 2.11.0. Also, we made updates during JDK 7 development, bringing in all of the blockers, critical fixes and half of the major fixes along the way.


2. DOM. From a usability perspective DOM is awful. There are much better alternatives 
available, for example JDOM2 or XOM. The only reason anyone uses DOM rather than these 
alternatives is either (a) because they aren't aware of the alternatives, or (b) because 
of some kind of perception that DOM is "more standard". If we want to address 
the usability of XML processing in the JDK then an alternative/replacement for DOM would 
seem to be a high priority. If someone wants a summary of the badness of DOM then I'll 
address that in a separate thread.

I agree that DOM is not particularly user/developer friendly. I don't have data to support an estimate on how popular DOM is, but since it's been a "standard", and we value compatibility so much, the first goal in the proposal is to allow users to quickly get to such objects and continue using their existing code to process them.

When we get into more low level then, what I would propose is for us to take a step back from the existing technologies/standards such as DOM/SAX/StAX and think like a developer would. For example, as a developer, all I want maybe is to search an xml file and find a piece of information, I don't necessarily need to know whether it's DOM or SAX, just as I don't need to know what technology is behind Google.


3. JAXP factory mechanism. While the goal of allowing multiple implementations 
of core interfaces such as DOM, XPath, and XSLT is laudable, the current 
mechanism has many failings. It's hideously expensive because of the classpath 
search; it's fragile because the APIs are so weakly defined that the 
implemntations aren't 100% interoperable (for example you have no way of 
knowing whether the XPath factory will give you an XPath 1.0 or 2.0 processor, 
and no way of finding out which you have got); so in practice we see lots of 
problems where applications get a processor that doesn't work as the 
applications expects, just because it happens to be lying around on the 
classpath.

Agree. It's an important mechanism to give users freedom of choice of impls they prefer, but has room to improve. In the case of XPath, should we start a separate thread to discuss how we can improve it?


4. XQJ. The XQJ spec never found its way into the JDK, so there is no 
out-of-the-box XQuery support. The licensing terms for XQJ are also 
unsatisfactory (the license doesn't allow modification, which purists say means 
it's not a true open source license).

True, it's in the DB line of products. I'm not familiar with the licensing terms for the spec.


5. General lack of integration across the whole XML scene, e.g, separate and 
potentially incompatible factories for different services;

We can explore more on this. The example in the proposal is a possible case for parser & xpath integration.

a lack of clarity as to whether the XPath API is supposed to handle object 
models other than DOM, etc;

The spec required impl to support the default object model and made it free for impls to introduce others.

weak typing of interfaces such as setParameter() in order to accomodate 
variation, at the cost of interoperability.

StAX did better in this regard, with a list of specified properties. In case of setParameter(), it almost seemed to me that the author wanted to give impls room to specify their own parameters.


6. Failure to keep up to date with the W3C specs; if you want support for 
recent versions of XSLT or XPath then you need to go to third-party products. 
Even at the DOM level, namespaces are a bolt-on optional extra rather than an 
intrinsic capability.

We can discuss this in a separate thread as well.


7. Inconsistent policy on concrete classes versus interfaces.

Could you provide a few examples?


Is this project attempting to address the fundamental problems, or only to 
paper over the cracks?

The goal of the project is to improve usability, making it more approachable: easy tasks should be easy to do. Not mean to completely redesign all the features, but focus on common use cases to provide better APIs to handle them.

-Joe



We're planning on a jaxp project to address usability issues in the JAXP
API. One of the complaints about the JAXP API is the number of lines of
code that are needed to implement a simple task. Tasks that should take
one or two lines often take ten or twelve lines instead.

Michael Kay
Saxonica
m...@saxonica.com
+44 (0118) 946 5893




Reply via email to