Thanks for the addition, Liam; I should have mentioned that.

If your input has mixed content, and if the relevant sections have
xml:space='preserve' attributes…

<p xml:space='preserve'>The <em>very</em> <id>tc34q</id>.</p>

…whitespace stripping will be safe.

Similarly, it may be helpful to know that the whitspace gets lost if XML
strings…

<p>The <em>very</em> <id>tc34q</id>.</p>

…are evaluated as XQuery. To prevent that, you can add a statement to the
prolog of the query:

declare boundary-space preserve;
<p>The <em>very</em> <id>tc34q</id>.</p>

Whitespace handling is generally a tricky issue in XML.

Best,
Christian


On Wed, Feb 14, 2024 at 10:38 AM Liam R. E. Quin <l...@fromoldbooks.org>
wrote:

> On Tue, 2024-02-13 at 20:29 +0100, Christian Grün wrote:
>
>
> If your XML input has been properly indented to improve readibility, you
> can reduce the size of your database by dropping superfluous whitespace
> during the import:
>
> SET STRIPWS ON; CREATE DB ...
> db:create('db', '/path/to/documents', (), map { 'stripws': true() })
>
>
> Beware that this is not schema-based, and can remove whitespace nodes in
> mixed content -
>     <p>The <em>very</em> <id>tc34q</id>.</p>
> may become (as i understand it)
>     <p>The <em>very</em><id>tc34q</id>.</p>
> (i have seen this, with different software, cause potentially catastrophic
> problems in aircraft manuals!)
>
> liam
>
> --
>
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>

Reply via email to