Thanks for the context.

Still, it does not explain the difference in behavior bestween doc() and 
parse-xml().

As far as I understand the XDM specification, whitespace may be ignored by the 
parser if there is a DTD or XML Schema that says that an element is not PCDATA 
(DTD) or mixed (XML Schema). In the absense of (support for) schemas, all 
whitespace should be left in. Wendell Piez writes it with many details.

Whitespace in XML tricky. E.g. indenting XML cannot be done well without 
knowing which elements are PCDATA/mixed.

Now that I know about the CHOP option, I can use BaseX predictably. And the 
legacy reasons for keeping it set are understandable.

Best regards,
Jos

On dinsdag 16 februari 2021 23:10:05 CET Christian Grün wrote:
> There is an old (and still open) issue on GitHub [1] that might give you
> some more insight into the history of whitespace chopping in BaseX.
> 
> Hope this helps
> Christian
> 
> [1] https://github.com/BaseXdb/basex/issues/913
> 
> 
> 
> 
> Jos van den Oever <j...@vandenoever.info> schrieb am Di., 16. Feb. 2021,
> 
> 22:41:
> > Hi Christian,
> > 
> > Yes, writing 'CHOP=OFF' in .basex stops the vanishing of whitespace.
> > 
> > But where in the XQuery or XDM spec does it say that whitespace handling
> > when
> > parsing is implementation dependent?
> > 
> > Cheers,
> > Jos
> > 
> > On dinsdag 16 februari 2021 22:10:30 CET Christian Grün wrote:
> > > Hi Jos,
> > > 
> > > Whitespaces will be preserved if the CHOP option is disabled. You can
> > 
> > make
> > 
> > > this a default by adding CHOP=false in your .basex configuration file
> > 
> > [1,2].
> > 
> > > Hope this helps,
> > > Christian
> > > 
> > > [1] https://docs.basex.org/wiki/Full-Text#Mixed_Content
> > > [2] https://docs.basex.org/wiki/Configuration
> > > 
> > > 
> > > 
> > > 
> > > Jos van den Oever <j...@vandenoever.info> schrieb am Di., 16. Feb. 2021,
> > > 
> > > 22:00:
> > > > Dear all,
> > > > 
> > > > First off: BaseX is great to work with. I use it for a few statically
> > > > generated websites.
> > > > 
> > > > But I recently found what might be a bug.
> > > > 
> > > > Some whitespace vanishes when loading xml files. E.g. this xml file:
> > > > 
> > > > ```test.xml
> > > > <a> a b <a> c </a> d e </a>
> > > > ```
> > > > 
> > > > run like this:
> > > > 
> > > > doc('test.xml')
> > > > 
> > > > gives:
> > > > 
> > > > <a>a b<a>c</a>d e</a>
> > > > 
> > > > But running this:
> > > > 
> > > > ```
> > > > parse-xml('<a> a b <a> c </a> d e </a>')
> > > > ```
> > > > 
> > > > retains the whitespace.
> > > > 
> > > > I've tested this with BaseX 7.0, 8.0, 9.0 and 9.4.6.
> > > > 
> > > > Running this in saxon-he-10.3.jar retains the whitespace.
> > > > 
> > > > I can work around this issue by placing xml:space="preserve" in the
> > > > document
> > > > element.
> > > > 
> > > > I cannot come up with a scenario in which discarding whitespace during
> > 
> > is
> > 
> > > > parsing is ok when no DTD or XML Schema is provided.
> > > > 
> > > > Best regards,
> > > > Jos

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to