Michael (other than me :-)) you are obviously right.

—
Mit freundlichen Grüßen
Michael Seiferle

On Fri, Apr 5, 2013 at 12:29 PM, Michael Piotrowski <[email protected]> wrote:

> Dirk,
> On 2013-04-05, Dirk Kirsten <[email protected]> wrote:
>> You are certainly right that with mixed content and the example you have
>> given here chopping does make a semantic difference.
>> However, you can disable this behaviour so BaseX does what you want. So the
>> only reason I see why one should change the default behaviour would be
>> because the default is not confirmant to some XML standard. However, I can
>> not find any specifics in the spec about which is the expected behaviour,
>> so in my opinion BaseX is doing nothing wrong here.
> Well, if you agree that chopping may alter the semantics of a document,
> wouldn't you agree that applying such a transformation *by default* is a
> bad idea?
> With respect to the XML specification, section 2.10 "White Space
> Handling" says:
>   An XML processor MUST always pass all characters in a document that
>   are not markup through to the application.
> Yes, the spec is vague wrt. to whitespace handling, and the existence of
> the xml:space attribute shows that different behaviors--including
> potentially corrupting ones--are possible.  I would therefore interpret
> the spec to mean that by default all characters should be preserved, but
> that other behaviors are possible.
>> I see that this behaviour might be surprising for some users, but this
>> might as well be the case if it were the other way round.
> No, because their documents wouldn't be corrupted.  You can easily
> remove all whitespace afterwards if you decide you don't need it, but
> once it's gone, it's gone and cannot be restored.  That's the problem.
>> Additionally, if we would change this now it would break application
>> code and unless there is a good reason (i.e. BaseX is actually doing
>> something wrong or non-compliant) I don't see why one should change
>> the default.
> Well, I'm not on a crusade or anything, so if you believe that it's a
> good idea to corrupt, by default, all documents containing mixed content
> on import, or if this behavior must be kept for compatiblity, so be it.
> I just wanted to point out that whitespace chopping may, in fact, alter
> the semantics of documents--it's not as harmless as it may seem.
> Best regards
> -- 
> Dr.-Ing. Michael Piotrowski, M.A. <[email protected]>
> Institute of Computational Linguistics, University of Zurich
> Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044
> * OUT NOW: Natural Language Processing for Historical Texts
> * <http://morganclaypool.com/doi/abs/10.2200/S00436ED1V01Y201207HLT017>
> _______________________________________________
> BaseX-Talk mailing list
> [email protected]
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to