Here's a brief description of this.

The XML parser methods already take a parameter, an instance of ParsingOptions. This was augmented to have one additional boolean - preserveComments (defaults to false).

If not set, then the parser works as before. No lexical handler is installed, so it should operate as fast as before. There *is* one extra slot in the Java object representation corresponding to some of the elements in the XML (not all elements have their own Java object class); this slot is set to null in this case.

When preserveComments is true, the slot is set to be a reference to the DOM Element node object corresponding to that object. This results in the "DOM" that previously was a *temporary* object, being retained while the Java objects corresponding to it are retained. This will increase the "footprint" for a parsed UIMA Descriptor, of course.

The *toXML* method was modified to check this slot, and if it is not null, the DOM around the vicinity of the element is scanned for comment and whitespace nodes, and the appropriate ones are used. An attempt is made to be heuristically close to the original - in the presence of some editing (adding / deleting nodes). See the bottom of MetaDataObject_impl class for some details of this.

The Component Descriptor Editor is modified to preserve comments (only for those XML pieces which it is editing and might be writing out).

So, the good news is, if you edit a descriptor with the CDE and it has an Apache license header at the top, it will no longer be deleted... :-)

All the test cases pass, and I did some amount of manual editing / testing; more testing welcome.

-Marshall

Reply via email to