Hi James, Thanks for your valuable feedback on this.
We also had some problems with different implementation implementing things differently. IIRC, Rich once made a workaround, by actually looking at what parser is being used underneath. Do you think doing something like that will help to solve the problem? If yes, I'm happy to implement or to get a contribution from you guys. (I will look in to this, but appreciate if u can create a JIRA out of this, giving the details found in this mail abt DTD handling) In the mean time, if you come across problems in using Axiom, just do not hesitate to create JIRAs or post them here. I believe we should have better coordination/cooperation between two communities, i.e. Axiom and Abdera, as we both from ASF. Thanks -- Chinthaka James M Snell wrote: > While investigating a number of security concerns for the Abdera > project, I noticed that there were a number of problems with DTD > handling in the various stax parser implementations. For instance, if > you parse the following xml document with Axiom using the Woodstox > parser, then reserialize it the xml will be invalid. > > Input: > > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE feed [ > <!ENTITY foo "bar"> > <!ENTITY bar "foo"> > ]> > <feed xmlns="http://www.w3.org/2005/Atom" > > </feed> > > Output using Woodstox: > > <?xml version="1.0" encoding="utf-8"?> > > <!ENTITY foo "bar"> > <!ENTITY bar "foo"> > > <feed xmlns="http://www.w3.org/2005/Atom" > > </feed> > > Output using Stax Reference Impl > > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE feed [ > <!ENTITY foo "bar"> > <!ENTITY bar "foo"> > ]> > <feed xmlns="http://www.w3.org/2005/Atom" > > </feed> > > Comparing these two, it would appear as if there is a bug in Woodstox. > Unfortunately, Woodstox is apparently acting exactly as the Stax spec > says it should and it's actually the Stax reference impl that's doing it > wrong... apparently. So I had to dig a little deeper. > > In StAXOMBuilder, the createDTD method calls parser.getText() to get the > DTD contents. According to the Stax javadocs and spec, getText returns > the internal subset of the DTD, not the complete doctype declaration. > So while the stax reference implementation is doing what we want, it's > apparently not doing what the stax spec says it should be doing. > > According to the woodstox developers, there is currently no way of > getting to the complete DTD doctype declaration using the standardized > XMLStreamReader interface. The XMLEventReader interface, however, works > just fine. > > So where does this leave us? Using Axiom and Woodstox to parse > documents containing doctype decls produces invalid XML; Using Axiom and > the Stax ref impl requires relying on what is apparently either a bug or > a deliberate incompatibility with the spec. > > Now, by this point you should note that I am using the word "apparently" > a lot. That's because I'm basing this information off what one woodstox > developer told me and I've been unable to verify. > > Another problem that I've noticed with the stax DTD handling is that > even when you tell it not to replace entity references, it will still > replace entity references found in attribute values.... which is more > than just slightly annoying. > > In any case, I wanted to report these issues. In the very near future I > will also post some feedback on various experiences we've had developing > with Axiom and suggestions on how to make things better. > > - James > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
signature.asc
Description: OpenPGP digital signature
