Then to pass the XQuery test suite you probably use CHOP=OFF. Are there other settings needed to be compliant?
On woensdag 17 februari 2021 00:04:38 CET Christian Grün wrote: > Yes, you are certainly right. I think it was around 2007 when we chopped > whitespaces by default, although we knew it didn't comply with the > specification. One reason was that we rarely worked with mixed-content data > at that time, and the whitespace indentations increased the size of > databases and led to worse rendering results in the built-in visualizations > (our first users were confused about that). > > Maybe we’ll switch the default in a future version of BaseX. > > > > > Jos van den Oever <j...@vandenoever.info> schrieb am Di., 16. Feb. 2021, > > 23:36: > > Thanks for the context. > > > > Still, it does not explain the difference in behavior bestween doc() and > > parse-xml(). > > > > As far as I understand the XDM specification, whitespace may be ignored by > > the > > parser if there is a DTD or XML Schema that says that an element is not > > PCDATA > > (DTD) or mixed (XML Schema). In the absense of (support for) schemas, all > > whitespace should be left in. Wendell Piez writes it with many details. > > > > Whitespace in XML tricky. E.g. indenting XML cannot be done well without > > knowing which elements are PCDATA/mixed. > > > > Now that I know about the CHOP option, I can use BaseX predictably. And > > the > > legacy reasons for keeping it set are understandable. > > > > Best regards, > > Jos > > > > On dinsdag 16 februari 2021 23:10:05 CET Christian Grün wrote: > > > There is an old (and still open) issue on GitHub [1] that might give you > > > some more insight into the history of whitespace chopping in BaseX. > > > > > > Hope this helps > > > Christian > > > > > > [1] https://github.com/BaseXdb/basex/issues/913 > > > > > > > > > > > > > > > Jos van den Oever <j...@vandenoever.info> schrieb am Di., 16. Feb. 2021, > > > > > > 22:41: > > > > Hi Christian, > > > > > > > > Yes, writing 'CHOP=OFF' in .basex stops the vanishing of whitespace. > > > > > > > > But where in the XQuery or XDM spec does it say that whitespace > > > > handling > > > > > > when > > > > parsing is implementation dependent? > > > > > > > > Cheers, > > > > Jos > > > > > > > > On dinsdag 16 februari 2021 22:10:30 CET Christian Grün wrote: > > > > > Hi Jos, > > > > > > > > > > Whitespaces will be preserved if the CHOP option is disabled. You > > > > > can > > > > > > > > make > > > > > > > > > this a default by adding CHOP=false in your .basex configuration > > > > > file > > > > > > > > [1,2]. > > > > > > > > > Hope this helps, > > > > > Christian > > > > > > > > > > [1] https://docs.basex.org/wiki/Full-Text#Mixed_Content > > > > > [2] https://docs.basex.org/wiki/Configuration > > > > > > > > > > > > > > > > > > > > > > > > > Jos van den Oever <j...@vandenoever.info> schrieb am Di., 16. Feb. > > > > 2021, > > > > > > > 22:00: > > > > > > Dear all, > > > > > > > > > > > > First off: BaseX is great to work with. I use it for a few > > > > statically > > > > > > > > generated websites. > > > > > > > > > > > > But I recently found what might be a bug. > > > > > > > > > > > > Some whitespace vanishes when loading xml files. E.g. this xml > > > > file: > > > > > > ```test.xml > > > > > > <a> a b <a> c </a> d e </a> > > > > > > ``` > > > > > > > > > > > > run like this: > > > > > > > > > > > > doc('test.xml') > > > > > > > > > > > > gives: > > > > > > > > > > > > <a>a b<a>c</a>d e</a> > > > > > > > > > > > > But running this: > > > > > > > > > > > > ``` > > > > > > parse-xml('<a> a b <a> c </a> d e </a>') > > > > > > ``` > > > > > > > > > > > > retains the whitespace. > > > > > > > > > > > > I've tested this with BaseX 7.0, 8.0, 9.0 and 9.4.6. > > > > > > > > > > > > Running this in saxon-he-10.3.jar retains the whitespace. > > > > > > > > > > > > I can work around this issue by placing xml:space="preserve" in > > > > > > the > > > > > > document > > > > > > element. > > > > > > > > > > > > I cannot come up with a scenario in which discarding whitespace > > > > during > > > > > > is > > > > > > > > > > parsing is ok when no DTD or XML Schema is provided. > > > > > > > > > > > > Best regards, > > > > > > Jos
signature.asc
Description: This is a digitally signed message part.