Re: POM rewriting with DecentXML

Aaron Digulla Tue, 05 Aug 2008 09:08:58 -0700

Quoting Jason van Zyl <[EMAIL PROTECTED]>:

StAX can't preserve whitespace between attributes, between "<" andthe element name, whitespace after the last attribute and the ">",between "</" and the end element name. Same goes for all pullparsers.
Why not fix StAX?

Because StAX is not meant to do this. I need to keep the original XMLsource somewhere to be able to recreate anything you might have done.That includes entities (and how you entered them originally) and allkind of weird stuff that every XML parser out there throws away.

In my code, I tokenize the XML source and then keep references tothese tokens. Can StAX do that? Do I have full access to the unicodeinput stream? Can I patch the tokenizer?

Later, in your POM reader, you turn the XML events into a Java objectmodel. At this stage, all the information I've gathered is thrownaway. So even if I could extend StAX to keep the necessary bits, youwould still have to rewrite your POM readers to save the XML tokenssomewhere and then, later, when we want to recreate the POM, you wouldhave to collect that information from the various bits and pieces.

And even if that would all work ... how would you preserve theoriginal order of XML elements from the Java version of the POM? Imean, it's nice and all that I can iterate over the dependencies butis the original order preserved?

Sorry, Jason, your arguments only tell me that you haven't thoughtthis through.

As I said: My parser is probably not so useful as a general purposereplacement for POM *reading* in general. It ought to be used inthe Maven artifact plugin and any other code which *writes* POMfiles.
If we've read in the model using the tools that we currently use which
knows about everything about the whitespace, and then manipulate the
model in memory how exactly would we integrate your writer?

Same issue as above. My suggestion is to keep the model reader as itis. If you write a plugin which wants to manipulate any kind of XML,you add a dependency to DecentXML, read the XML, manipulate it andwrite it out.


There is no way to read the XML with tool A and then write it out with tool B.

You can fix StAX, we know the authors. Even if you added an extension
property that turned on better whitespace handling that would be fine.

StAX is just another XML parser. It might be better for round-trippingthan SAX and all the other crap but so far, you've failed to convinceme that you even understand what the issue is, so I can't trust yourtrust in StAX :)

That said, how do you manipulate the result of what StAX gives you? Imean, StAX is a streaming API. Which means I would have to build amodel from the XML events returned by StAX. Only then, I couldmanipulate that XML document.

I'm not keen on pulling in another XML parser to be honest.

I know that. I don't have a better solution because there probablyisn't. I don't start forks just because of the fun of it. This isessential an unsolved problem in the XML space, it's been unsolvedsince XML was invented and it won't ever be solved because it's acorner case. I just happen to be in that corner very often, so Ifinally gave in and started on a solution.

My solution returns a complete XML document to begin with, so thesetup is just a single line of code and then you can start working onthe document.


Regards,

--
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: POM rewriting with DecentXML

Reply via email to