Thorsten Scherler wrote: > Ross Gardler escribi??: > > David Crossley wrote: > > > I don't know much about encodings, but why are the documents output > > > by our xml serializer as ISO-8859-1 rather than UTF-8? > > > > I'm confused, I thought we were outputing UTF-8, certainly recent > > messages on the user list say we are. > > Forrest is using UTF-8 in both skins and the dispatcher as *HTML* > serializer. > > main/webapp/sitemap.xmap > ... > <map:serializer name="html" mime-type="text/html" > src="org.apache.cocoon.serialization.HTMLSerializer"> > <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public> > <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system> > <encoding>UTF-8</encoding> > </map:serializer> > > dispatcher/internal.xmap > ... > <map:serializer logger="sitemap.serializer.xhtml" mime-type="text/html" > name="xhtml" pool-grow="2" pool-max="64" pool-min="2" > src="org.apache.cocoon.serialization.XMLSerializer"> > <doctype-public> -//W3C//DTD XHTML 1.0 Strict//EN </doctype-public> > <doctype-system> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd > </doctype-system> > <encoding>UTF-8</encoding> > </map:serializer> > > David did not ask for this format (html, on which the user threads are > about) but for *xml*.
Yeah, our answers to Ross crossed in mid-air. > in main/webapp/sitemap.xmap you find: > <map:serializer name="xml" mime-type="text/xml" > src="org.apache.cocoon.serialization.XMLSerializer"/> > > Unlike the other examples I gave this one do not have set any encoding. > > > http://cocoon.apache.org/2.1/userdocs/xml-serializer.html > states: > "The XML Serializer accepts following configuration parameters. These > configurations are not Xalan specific. > > Name - Xalan Default Value > ... > encoding - none > ..." > > Looking on http://localhost:8888/index.xml I find > <?xml version="1.0" encoding="UTF-8"?> > > We state in our FAQ for the question: Does Forrest handle accents for > non-English languages? > "This is because sources for Forrest docs are XML documents, which can > include any of these, provided the encoding declared by the XML doc > matches the actual encoding used in the file." > > David, why do you think we would use ISO-8859-1 for xml? I am presuming that we just forgot to set the UTF-8 encoding parameter. > Is it because of: > <map:serializer name="links" > src="org.apache.cocoon.serialization.LinkSerializer"> > <encoding>ISO-8859-1</encoding> > </map:serializer> That was going to be my next question. That should be UTF-8 too. > http://www.blooberry.com/indexdot/html/topics/urlencoding.htm > gives the answer for this: > "...HTML, on the other hand, allows the entire range of the ISO-8859-1 > (ISO-Latin) character set to be used in documents..." > > Doing a grep on forrest-trunk brings some hits on ISO-8859-1. Some of > them like the i18n stuff are needed for german, french, spanish, ... Are you sure? > The cap.xml is the only file that declared to need ISO-8859-1 as well. I reckon that is an accident from the original author's text editor. The rest of our xml source docs should be UTF-8. I asked this once long ago on cocoon-dev and the answer was emphatic to use UTF-8 across the board. > Some xsl have as well use this ISO. > > Anyway IMO the answer to the subject of this thread is, that forrest > *is* using UTF-8 on xml documents that is using this encoding. I think that we actually have inconsistency problems. -David
