On 2013-04-05, Michael Seiferle <[email protected]> wrote:

> As chopping does not change any semantics (at least with regards to
> what XML thinks of semantically important) but only aesthetics this is
> enabled by default.

I'm sorry to disagree, but chopping certainly *does* change the
semantics--that's precisely why I've argued before that it shouldn't be
on by default.

The problem becomes obvious with mixed content, e.g., with chopping
enabled

<doc>
  <p>Lorem ipsum <em>dolor</em> <x>sit</x> amet ...</p>
</doc>

becomes

<doc>
  <p>Lorem ipsum<em>dolor</em><x>sit</x>amet ...</p>
</doc>

which is *not* the same, and AFAIKT this is not conforming behavior (and
BaseX doesn't honor xml:space either).

I do understand that whitespace chopping as currently implemented is
useful for some data-oriented applications, even if it is not
conforming, but by default, the behavior should conform to the XML
standard.

Best regards

-- 
Dr.-Ing. Michael Piotrowski, M.A. <[email protected]>
Institute of Computational Linguistics, University of Zurich
Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044
* OUT NOW: Natural Language Processing for Historical Texts
* <http://morganclaypool.com/doi/abs/10.2200/S00436ED1V01Y201207HLT017>
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to