Thanks for the addition, Liam; I should have mentioned that. If your input has mixed content, and if the relevant sections have xml:space='preserve' attributes…
<p xml:space='preserve'>The <em>very</em> <id>tc34q</id>.</p> …whitespace stripping will be safe. Similarly, it may be helpful to know that the whitspace gets lost if XML strings… <p>The <em>very</em> <id>tc34q</id>.</p> …are evaluated as XQuery. To prevent that, you can add a statement to the prolog of the query: declare boundary-space preserve; <p>The <em>very</em> <id>tc34q</id>.</p> Whitespace handling is generally a tricky issue in XML. Best, Christian On Wed, Feb 14, 2024 at 10:38 AM Liam R. E. Quin <l...@fromoldbooks.org> wrote: > On Tue, 2024-02-13 at 20:29 +0100, Christian Grün wrote: > > > If your XML input has been properly indented to improve readibility, you > can reduce the size of your database by dropping superfluous whitespace > during the import: > > SET STRIPWS ON; CREATE DB ... > db:create('db', '/path/to/documents', (), map { 'stripws': true() }) > > > Beware that this is not schema-based, and can remove whitespace nodes in > mixed content - > <p>The <em>very</em> <id>tc34q</id>.</p> > may become (as i understand it) > <p>The <em>very</em><id>tc34q</id>.</p> > (i have seen this, with different software, cause potentially catastrophic > problems in aircraft manuals!) > > liam > > -- > > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org >