Re: POM rewriting with DecentXML

Stuart McCulloch Tue, 05 Aug 2008 10:41:30 -0700

2008/8/5 Aaron Digulla <[EMAIL PROTECTED]>

> Quoting Jason van Zyl <[EMAIL PROTECTED]>:
>
> Why not fix StAX?
>>
>
> Because StAX is not meant to do this. I need to keep the original XML
> source somewhere to be able to recreate anything you might have done. That
> includes entities (and how you entered them originally) and all kind of
> weird stuff that every XML parser out there throws away.
>


this isn't necessarily true - you could take the XML source and map
it into a model that's used by the application - later on you'd take the
modified model and compare (diff) it against the original model and
use this to rewrite sections of the XML source while keeping the rest
undisturbed... (using indexing to improve performance/tracking)

you could even track edits internally while the model is manipulated,
then you don't need the whole document kept in memory all the time
- only when writing out the actual changes.

In my code, I tokenize the XML source and then keep references to these
> tokens. Can StAX do that? Do I have full access to the unicode input stream?
> Can I patch the tokenizer?
>
> Later, in your POM reader, you turn the XML events into a Java object
> model. At this stage, all the information I've gathered is thrown away. So
> even if I could extend StAX to keep the necessary bits, you would still have
> to rewrite your POM readers to save the XML tokens somewhere and then,
> later, when we want to recreate the POM, you would have to collect that
> information from the various bits and pieces.
>

all that information is still in the original XML source, so you
just need to be able to translate model changes into minimal
edits to the XML source (not trivial, but not impossible - I've
done this for other file formats in the past)


> And even if that would all work ... how would you preserve the original
> order of XML elements from the Java version of the POM? I mean, it's nice
> and all that I can iterate over the dependencies but is the original order
> preserved?
>

using the "diff" approach any unchanged elements automatically
keep their original order (actually that should also be the case at
the moment, if the Java model uses the right collection classes)

the tricky part is usually deciding where to slot in new elements...


>
>  As I said: My parser is probably not so useful as a general purpose
>>>  replacement for POM *reading* in general. It ought to be used in  the Maven
>>> artifact plugin and any other code which *writes* POM  files.
>>>
>>
>> If we've read in the model using the tools that we currently use which
>> knows about everything about the whitespace, and then manipulate the
>> model in memory how exactly would we integrate your writer?
>>
>
> Same issue as above. My suggestion is to keep the model reader as it is. If
> you write a plugin which wants to manipulate any kind of XML, you add a
> dependency to DecentXML, read the XML, manipulate it and write it out.
>

which kind of sucks if you want to pass the model around collecting
changes from different components - then everyone would have to
use the DecentXML document, otherwise you'd lose the formatting.

There is no way to read the XML with tool A and then write it out with tool
> B.
>

I think there is, you just have to be able to map model changes into
minimal XML changes - of course the more context you can stash in
the actual model the easier this is (DecentXML stashes everything)


> I'm not keen on pulling in another XML parser to be honest.
>>
>
> I know that. I don't have a better solution because there probably isn't. I
> don't start forks just because of the fun of it. This is essential an
> unsolved problem in the XML space, it's been unsolved since XML was invented
> and it won't ever be solved because it's a corner case. I just happen to be
> in that corner very often, so I finally gave in and started on a solution.
>

I wouldn't say it's unsolved or unsolvable - there are many ways to
achieve different levels of round-tripping, and just because a parser
doesn't achieve 100% doesn't mean that it's useless - there's always
some trade-off (space, performance, etc.)


> My solution returns a complete XML document to begin with, so the setup is
> just a single line of code and then you can start working on the document.
>

your solution is interesting, but I think you'd get more support if you
stopped dissing everything else - there's been a lot of innovation in
this area already and I expect there's still more to come.

at least have a serious look at StAX and see if it could be improved

Regards,
>
> --
> Aaron "Optimizer" Digulla a.k.a. Philmann Dark
> "It's not the universe that's limited, it's our imagination.
> Follow me and I'll show you something beyond the limits."
> http://www.pdark.de/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-- 
Cheers, Stuart

Re: POM rewriting with DecentXML

Reply via email to