Hello.
After sucessfully installing Cocoon 1.8.2, I
encounter parsing errors when trying to transform xml documents into html using
the cocoon xslt processor. The documents are encoded in UTF-8, and the
problem is caused by element names containing the Danish letters æ,
ø and å (ae, oe, aa) in the UTF-8 encoding. The same letters
in regular text are parsed correctly, though, the problem only occurs
in element names. The documents in question have been parsed without
problems in XMLspy and James Clerk's SP.
Has anyone else encountered this problem? Of
course, an obvious solution is to avoid non-english characters in element
names, but this may require large amounts of filtering of existing texts,
changing of DTD's etc. To my best knowledge, non-English characters should be
allowed in XML names.
The platform is: Suse Linux 7.1, Apache 1.3.14,
Tomcat 3.2.2, JDK 1.1.8.
The error stack is the following:
org.xml.sax.SAXException: A ')' is required in the declaration of element type "simpledoc". [FATAL ERROR] [File: "file:/var/jakarta-tomcat-3.2.2/webapps/cocoon/diplo/charbug.dtd" Line: 3 Column: 24] (nested exception: org.xml.sax.SAXParseException: A ')' is required in the declaration of element type "simpledoc". ) at org.apache.cocoon.parser.AbstractParser.fatalError(AbstractParser.java:105) at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1037) at org.apache.xerces.framework.XMLDTDScanner.reportFatalXMLError(XMLDTDScanner.java:654) at org.apache.xerces.framework.XMLDTDScanner.scanChildren(XMLDTDScanner.java:1979) at org.apache.xerces.framework.XMLDTDScanner.scanElementDecl(XMLDTDScanner.java:1771) at org.apache.xerces.framework.XMLDTDScanner.scanDecls(XMLDTDScanner.java:1436) at org.apache.xerces.framework.XMLDocumentScanner.scanDoctypeDecl(XMLDocumentScanner.java:2179) at org.apache.xerces.framework.XMLDocumentScanner.access$0(XMLDocumentScanner.java:2133) at org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher.dispatch(XMLDocumentScanner.java:882) at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:380) at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900) at org.apache.cocoon.parser.XercesParser.parse(XercesParser.java:85) at org.apache.cocoon.parser.AbstractParser.parse(AbstractParser.java:83) at org.apache.cocoon.producer.ProducerFromFile.getDocument(ProducerFromFile.java:78) at org.apache.cocoon.Engine.handle(Engine.java:359) at org.apache.cocoon.Cocoon.service(Cocoon.java:183) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at org.apache.tomcat.core.ServletWrapper.doService(ServletWrapper.java:405) at org.apache.tomcat.core.Handler.service(Handler.java:287) at org.apache.tomcat.core.ServletWrapper.service(ServletWrapper.java:372) at org.apache.tomcat.core.ContextManager.internalService(ContextManager.java:797) at org.apache.tomcat.core.ContextManager.service(ContextManager.java:743) at org.apache.tomcat.service.connector.Ajp13ConnectionHandler.processConnection(Ajp13ConnectionHandler.java:160) at org.apache.tomcat.service.TcpWorkerThread.runIt(PoolTcpEndpoint.java:416) at org.apache.tomcat.util.ThreadPool$ControlRunnable.run(ThreadPool.java:501) at java.lang.Thread.run(Thread.java) The DTD in question looks like this:
<?xml version="1.0" encoding="UTF-8"?> <!-- edited with XML Spy v3.5 NT (http://www.xmlspy.com) by Anders Conrad (DSL) --><!ELEMENT simpledoc (række)> <!ELEMENT række (#PCDATA)> and the fix would be a change to the following (with the similar fix in the test document): <?xml version="1.0" encoding="UTF-8"?> <!-- edited with XML Spy v3.5 NT (http://www.xmlspy.com) by Anders Conrad (DSL) --><!ELEMENT simpledoc (raekke)> <!ELEMENT raekke (#PCDATA)> I have the entire reproducible available in case
somone is interested.
Any suggestions or commentary would be
welcome!
Anders
Anders
Conrad
Det Danske Sprog- og Litteraturselskab
IT-redaktør, cand.mag. Christians Brygge 1 E-mail: [EMAIL PROTECTED] 1219 København K Tlf. 33 13 06 60 |