Re: [basex-talk] Weird: mixed content trimmed unexpectedly

Liam R. E. Quin Mon, 09 Dec 2019 11:48:55 -0800

On Mon, 2019-12-09 at 20:27 +0100, Arjan Loeffen wrote:
> 
> In general: when the wiki states here: "Many XML documents include
> whitespaces that have been added to improve readability. ", this
> should not
> apply to mixed content fragments as described. Only to start and end
> of
> "text content of elements", not on text nodes.
> I therefore also think that the second approach is not exactly in
> line with
> the *intention *of the XML standard.


It isn't, but some of the earliest XML parsers had the option to drop
white-space-only text nodes (e.g. MSXML i think) because of XML used in
data contexts. The intent was that a DTD could be used to determine
which spaces to ignore, but then DTDs became optional.

A parser without a DTD does not know which elements _could_ contain
text, and hence doesn't know what to drop. In addition, markup like,

  <person>
    <name>
       Nigel
    </name>
    <obedience>
       0.4
    </obedience>
  </person>

is common, unfortunately. In SGML this worked but the whitespace rules
were complex enough that were a constant source of trouble.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

Re: [basex-talk] Weird: mixed content trimmed unexpectedly

Reply via email to