[ http://jira.codehaus.org/browse/DOXIA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vincent Siveton updated DOXIA-226: ---------------------------------- Fix Version/s: (was: 1.0-beta-1) 1.0-beta-2 > Make XML based parsers better handle whitespace > ----------------------------------------------- > > Key: DOXIA-226 > URL: http://jira.codehaus.org/browse/DOXIA-226 > Project: Maven Doxia > Issue Type: Improvement > Reporter: Benjamin Bentmann > Fix For: 1.0-beta-2 > > > Regarding whitespace in XML documents, one needs to consider the following > aspects: > - ignorable whitespace, i.e. view "{{<tr> <td/> </tr>}}" and > "{{<tr><td/></tr>}}" as equivalent > - collapsible whitespace, i.e. view "{{Text Text}}" and "{{Text > Text}}" as equivalent > - trimmable whitespace, i.e. view "{{<p> Text </p>}}" and "{{<p>Text</p>}}" > as equivalent > Those distinctions require a DTD/XSD in combination with a validating parser > and/or application-specific knowledge. For robustness, doxia parsers for > XML-based formats should not depend on the existence of a schema definition > such that they reliably deliver events into the sinks. Hence I suggest to > hard-code the required logic for proper whitespace handling into each parser. > Currently, whitespace handling is rather static, e.g. {{XhtmlBaseParser}} > pushes all input whitespace into the sink. This might cause troubles with > sinks that are not expected to receive ignorable whitespace. To address this > issue, it seems helpful if {{AbstractXmlParser}} provided a default > implementation of {{handleText()}} that subclasses can simply control via > state flags instead of implementing {{handleText()}} from scratch in each > parser. Copy&Paste - which caused DOXIA-225 - needs to be avoided. > More precisely, I image the following changes: > - Have {{AbstractXmlParser}} maintain a stack of tuples (ignorable, > collapsible, trimmable) where each tuple describes the whitespace handling > for the currently parsed element > - Have {{AbstractXmlParser}} push/pop a tuple from this stack before/after > calling {{handleStartTag()}}/{{handleEndTag()}} > - Have {{AbstractXmlParser}} provide setters to allow subclasses to control > the desired whitespace handling in their {{handleStartTag()}} implementation > - Have {{AbstractXmlParser}} implement {{handleText()}} where it evalutes the > top-most tuple from the stack -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira