One of the most widespread uses of XML is as a neutral storage and exchange format for documents. In these cases, avoiding XML or SGML would just imply going back to Word or FrameMaker (and we don't want that), or to LateX or Texi, which are similar wrt. merging. Or HTML, an application of SGML. And anyway, a lot of documentation for open source projects are being written or converted to DocBook, and will be maintained using the same revision control tools as the rest of the projects, i.e., cvs. So we are going to see questions about XML or SGML pop up more frequently.
Many of the issues wrt. cvs are essentially not much different from maintaining documentation written in LateX or Texi format. Except you can't assume that authors will be using vi or emacs. A lot of different tools will be used -- that was one of the main points of using SGML or XML. It is common sense to break up a document into mini- or micro-documents with each their own lifecycle -- just as you do for programming source code. The concept of storage management is built into SGML and XML at a very low level. The customary way to do this is by declaring entitities, symbolic names for storage objects, which can then be included in other documents at appropriate places. XInclude and XLink (and for SGML, HyTime) also offer ways to include or locate parts of documents in terms of parse trees. But how about the physical storage format of each file? Authors will often be using different XML or SGML editors that will 'beautify' the XML or SGML source in different ways, introducing spurious differences and conflicts. Another source of spurious conflicts are character encoding, namespace declarations, and order of attributes; most documents can be stored in a number of different ways with no loss of information for the intended use. But a simple diff will show a lot of difference that's not there, essentially. Until proper XML repositories become as ubiquitous as cvs, we might as well find a way to live with it. The character encoding is easy to control -- SGML and XML are very explicit about it, and editors do in general handle encoding gracefully. Namespace declarations and attribute order are tricky. Things can be normalized, see Canonical XML, http://www.w3.org/TR/xml-c14n, but full canonicalization of a documents will be too much. The 'beautify' problem is even worse., i.e., how to introduce and remove whitespace in a way that makes cvs behave meaningfully. I have not yet found a simple recipe for beautifying SGML and XML. Here are some of the options: The most generic, simple and safe way to break XML or SGML into lines is unfortunately not too pretty. Keep any line breaks already present in the source and, in addition, break just _before_ the markup delimiter close character '>' on the start tag, e.g.: $ osx xml.dcl beautify.xml <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "docbookx.dtd"> <section ><title >Beautifying XML</title><para >Papageno</para><para >Break inside markup like this: <emphasis role="bold" >some text</emphasis>.</para><para >Papagena</para></section> Some tools can beautify in a way more suitable for human consumption: $ xmllint --format beautify.xml <?xml version="1.0" encoding="iso-8859-1?> <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "docbookx.dtd"> <section> <title>Beautifying XML</title> <para>Papageno</para> <para>Break inside markup like this: <emphasis role="bold">some text</emphasis>.</para> <para>Papagena</para> </section> Keeping white space in character context while beautifying is a simple way to avoid problems with NOTATION linespecific AKA xml:space='preserve' AKA <pre>. But the reason we needed a beautfier in the first place is that editors put in different amounts of whitespace in different places. If someone out there have a nice and robust XSLT stylesheet for normalizing/beautifying XML, please publish! There are, BTW, XML diff tools. See e.g.: http://www.alphaworks.ibm.com/tech/xmldiffmerge http://www.deltaxml.com http://www.vmguys.com/vmtools http://www.logilab.org/xmldiff The first one can be used as merge tool. The other ones can produce a XML diff file that -- given a proper XML patch utility -- can update one one XML file to become the other one. There are, to the best of my knowledge, no freely available stand-alone SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice compare. kind regards, Peter Ring -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: 26. april 2002 23:45 To: CVS-II Discussion Mailing List Subject: RE: merge mode for XML <snip> A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that.... </snip> _______________________________________________ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
