Whitespace is probably only a minor factor here. It can’t explain the loading 
times that grow non-linearly with document count.

Dietmar, have you looked at the memory consumption? My experience is that if 
memory gets scarce, garbage collection will kick in frequently, slowing down 
the import process. Increasing -Xmx in the startup script might improve the 
import speed. If your computer has 16 GB of RAM, try setting -Xmx12g, for 
example, and see whether there is an improvement. You can see the memory 
consumption in the GUI, so try to create the DB from the GUI.

Gerrit

On 14.02.2024 10:48, Christian Grün wrote:
Thanks for the addition, Liam; I should have mentioned that.

If your input has mixed content, and if the relevant sections have 
xml:space='preserve' attributes…

<p xml:space='preserve'>The <em>very</em> <id>tc34q</id>.</p>

…whitespace stripping will be safe.

Similarly, it may be helpful to know that the whitspace gets lost if XML 
strings…

<p>The <em>very</em> <id>tc34q</id>.</p>

…are evaluated as XQuery. To prevent that, you can add a statement to the 
prolog of the query:

declare boundary-space preserve;
<p>The <em>very</em> <id>tc34q</id>.</p>

Whitespace handling is generally a tricky issue in XML.

Best,
Christian


On Wed, Feb 14, 2024 at 10:38 AM Liam R. E. Quin <l...@fromoldbooks.org 
<mailto:l...@fromoldbooks.org>> wrote:

    On Tue, 2024-02-13 at 20:29 +0100, Christian Grün wrote:

    If your XML input has been properly indented to improve readibility, you 
can reduce the size of your database by dropping superfluous whitespace during 
the import:

    SET STRIPWS ON; CREATE DB ...
    db:create('db', '/path/to/documents', (), map { 'stripws': true() })

    Beware that this is not schema-based, and can remove whitespace nodes in 
mixed content -
    <p>The <em>very</em> <id>tc34q</id>.</p>
    may become (as i understand it)
         <p>The <em>very</em><id>tc34q</id>.</p>
    (i have seen this, with different software, cause potentially catastrophic 
problems in aircraft manuals!)

    liam

Reply via email to