Interesting, Mike; didn't know that. Makes a certain amount of sense, 
since it's based on the definition of the containing element rather than 
what it actually contains.

(I've rarely counted on it; I get too many documents thrown at me without 
DTDs, or am processing in a context where I want to preserve the 
whitespace, so I've tended to code this into the application semantics 
instead. Which is probably why I didn't rememberi that simply specifying 
the DTD was sufficient.)


______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)



From:
Michael Glavassevich <mrgla...@ca.ibm.com>
To:
j-users@xerces.apache.org
Date:
07/11/2011 11:22 PM
Subject:
Re: dismissing characters such as carriage returns and spaces after an 
ending and before an starting tag ...



The document would need to have a DTD, but you don't need to be 
validating. Among other things, "ignorable whitespace" is always assessed 
when the document has a DTD which has been read, regardless of whether 
you've enabled validation or not.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

kesh...@us.ibm.com wrote on 07/11/2011 10:52:32 PM:

> If you are validating against a DTD, and IF the enclosing element 
> does not have mixed content, look at the SAX/DOM defiinitions of 
> "ignorable whitespace" and how to handle it. (The term is 
> unfortunately; it's better described as "whitespace in element-only 
content")
> 
> If you are not validating the document, the parser can not make this
> distinction and you must do so in your application code. 
> 
> 
> ______________________________________
> "You build world of steel and stone 
> I build worlds of words alone 
> Skilled tradespeople, long years taught: 
> You shape matter; I shape thought." 
> (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html) 
> 

> 
> From: 
> 
> Albretch Mueller <lbrt...@gmail.com> 
> 
> To: 
> 
> j-users@xerces.apache.org 
> 
> Date: 
> 
> 07/11/2011 06:13 PM 
> 
> Subject: 
> 
> dismissing characters such as carriage returns and spaces after an 
> ending and before an starting tag ...
> 
> 
> 
> 
> 
> ~
> I am XMLRead[er|ing] an XML file (which I am validating using the
> specified schema) that looks like this:
> ~
> <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/";
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
> http://www.mediawiki.org/xml/export-0.5.xsd"; version="0.5"
> xml:lang="en">
>  <siteinfo>
>    <sitename>Wikipedia</sitename>
>    <base>http://en.wikipedia.org/wiki/Main_Page</base>
>    <generator>MediaWiki 1.17wmf1</generator>
>    <case>first-letter</case>
>    <namespaces>
>      <namespace key="-2" case="first-letter">Media</namespace>
>      <namespace key="109" case="first-letter">Book talk</namespace>
>    </namespaces>
>  </siteinfo>
> </mediawiki>
> ~
> What do you do in order for the ContentHandler not to report as
> "characters" such character sequences after an ending and before an
> starting tag?
> ~
> Than you
> lbrtchx
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to