Quoting Jason van Zyl <[EMAIL PROTECTED]>:
StAX can't preserve whitespace between attributes, between "<" and
the element name, whitespace after the last attribute and the ">",
between "</" and the end element name. Same goes for all pull
parsers.
Why not fix StAX?
Because StAX is not meant to do this. I need to keep the original XML
source somewhere to be able to recreate anything you might have done.
That includes entities (and how you entered them originally) and all
kind of weird stuff that every XML parser out there throws away.
In my code, I tokenize the XML source and then keep references to
these tokens. Can StAX do that? Do I have full access to the unicode
input stream? Can I patch the tokenizer?
Later, in your POM reader, you turn the XML events into a Java object
model. At this stage, all the information I've gathered is thrown
away. So even if I could extend StAX to keep the necessary bits, you
would still have to rewrite your POM readers to save the XML tokens
somewhere and then, later, when we want to recreate the POM, you would
have to collect that information from the various bits and pieces.
And even if that would all work ... how would you preserve the
original order of XML elements from the Java version of the POM? I
mean, it's nice and all that I can iterate over the dependencies but
is the original order preserved?
Sorry, Jason, your arguments only tell me that you haven't thought
this through.
As I said: My parser is probably not so useful as a general purpose
replacement for POM *reading* in general. It ought to be used in
the Maven artifact plugin and any other code which *writes* POM
files.
If we've read in the model using the tools that we currently use which
knows about everything about the whitespace, and then manipulate the
model in memory how exactly would we integrate your writer?
Same issue as above. My suggestion is to keep the model reader as it
is. If you write a plugin which wants to manipulate any kind of XML,
you add a dependency to DecentXML, read the XML, manipulate it and
write it out.
There is no way to read the XML with tool A and then write it out with tool B.
You can fix StAX, we know the authors. Even if you added an extension
property that turned on better whitespace handling that would be fine.
StAX is just another XML parser. It might be better for round-tripping
than SAX and all the other crap but so far, you've failed to convince
me that you even understand what the issue is, so I can't trust your
trust in StAX :)
That said, how do you manipulate the result of what StAX gives you? I
mean, StAX is a streaming API. Which means I would have to build a
model from the XML events returned by StAX. Only then, I could
manipulate that XML document.
I'm not keen on pulling in another XML parser to be honest.
I know that. I don't have a better solution because there probably
isn't. I don't start forks just because of the fun of it. This is
essential an unsolved problem in the XML space, it's been unsolved
since XML was invented and it won't ever be solved because it's a
corner case. I just happen to be in that corner very often, so I
finally gave in and started on a solution.
My solution returns a complete XML document to begin with, so the
setup is just a single line of code and then you can start working on
the document.
Regards,
--
Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."
http://www.pdark.de/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]