Quoting Jason van Zyl <[EMAIL PROTECTED]>:

StAX can't preserve whitespace between attributes, between "<" and the element name, whitespace after the last attribute and the ">", between "</" and the end element name. Same goes for all pull parsers.

Why not fix StAX?

Because StAX is not meant to do this. I need to keep the original XML source somewhere to be able to recreate anything you might have done. That includes entities (and how you entered them originally) and all kind of weird stuff that every XML parser out there throws away.

In my code, I tokenize the XML source and then keep references to these tokens. Can StAX do that? Do I have full access to the unicode input stream? Can I patch the tokenizer?

Later, in your POM reader, you turn the XML events into a Java object model. At this stage, all the information I've gathered is thrown away. So even if I could extend StAX to keep the necessary bits, you would still have to rewrite your POM readers to save the XML tokens somewhere and then, later, when we want to recreate the POM, you would have to collect that information from the various bits and pieces.

And even if that would all work ... how would you preserve the original order of XML elements from the Java version of the POM? I mean, it's nice and all that I can iterate over the dependencies but is the original order preserved?

Sorry, Jason, your arguments only tell me that you haven't thought this through.

As I said: My parser is probably not so useful as a general purpose replacement for POM *reading* in general. It ought to be used in the Maven artifact plugin and any other code which *writes* POM files.

If we've read in the model using the tools that we currently use which
knows about everything about the whitespace, and then manipulate the
model in memory how exactly would we integrate your writer?

Same issue as above. My suggestion is to keep the model reader as it is. If you write a plugin which wants to manipulate any kind of XML, you add a dependency to DecentXML, read the XML, manipulate it and write it out.

There is no way to read the XML with tool A and then write it out with tool B.

You can fix StAX, we know the authors. Even if you added an extension
property that turned on better whitespace handling that would be fine.

StAX is just another XML parser. It might be better for round-tripping than SAX and all the other crap but so far, you've failed to convince me that you even understand what the issue is, so I can't trust your trust in StAX :)

That said, how do you manipulate the result of what StAX gives you? I mean, StAX is a streaming API. Which means I would have to build a model from the XML events returned by StAX. Only then, I could manipulate that XML document.

I'm not keen on pulling in another XML parser to be honest.

I know that. I don't have a better solution because there probably isn't. I don't start forks just because of the fun of it. This is essential an unsolved problem in the XML space, it's been unsolved since XML was invented and it won't ever be solved because it's a corner case. I just happen to be in that corner very often, so I finally gave in and started on a solution.

My solution returns a complete XML document to begin with, so the setup is just a single line of code and then you can start working on the document.

Regards,

--
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to