DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=36341>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=36341 Summary: Invalid UTF-8 error for non-ASCII meta data Product: Lenya Version: Trunk Platform: Other OS/Version: other Status: NEW Severity: critical Priority: P2 Component: Miscellaneous AssignedTo: [email protected] ReportedBy: [EMAIL PROTECTED] Josias Thoeny: I have seen this error. I tried to debug this some time ago, but I didn't find out much. To reproduce it, go to the site area of the default publication. Select e.g. the "Document Type Examples" page, copy it, and insert it somewhere. Here is the relevant part of the stacktrace that I'm getting: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence. at org.apache.lenya.cms.metadata.MetaDataImpl.loadValues(MetaDataImpl.java:169) at org.apache.lenya.cms.metadata.MetaDataImpl.<init>(MetaDataImpl.java:82) at org.apache.lenya.cms.metadata.LenyaMetaData.<init>(LenyaMetaData.java:74) at org.apache.lenya.cms.metadata.MetaDataManager.getLenyaMetaData(MetaDataManager.java:80) at org.apache.lenya.cms.metadata.MetaDataManager.replaceMetaData(MetaDataManager.java:151) at org.apache.lenya.cms.repository.RepositoryManagerImpl.copy(RepositoryManagerImpl.java:40) ... 100 more Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.lenya.xml.DocumentHelper.readDocument(DocumentHelper.java:173) at org.apache.lenya.cms.cocoon.source.SourceUtil.readDOM(SourceUtil.java:161) at org.apache.lenya.cms.metadata.MetaDataImpl.getDocument(MetaDataImpl.java:260) at org.apache.lenya.cms.metadata.MetaDataImpl.loadValues(MetaDataImpl.java:139) ... 105 more Here is another way to get the same problem: - Create a document with an umlaut in the navigation title (the umlaut will be written into the sitetree). - Perform an operation which changes the sitetree (e.g. create another document) I get the following stacktrace: <snip/> Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.lenya.xml.DocumentHelper.readDocument(DocumentHelper.java:173) at org.apache.lenya.cms.cocoon.source.SourceUtil.readDOM(SourceUtil.java:161) at org.apache.lenya.cms.site.tree.DefaultSiteTree.<init>(DefaultSiteTree.java:83) ... 51 more It seems the problem occurs when lenya reads a non-ascii char from a file and saves it again, using DocumentHelper/SourceUtil. If the special char comes from a web-form, it seems to be saved correctly. Can anyone else reproduce this? I wonder if it might have something to do with the following code in SourceUtil.java, around line 195: .... OutputStream oStream = source.getOutputStream(); Writer writer = new OutputStreamWriter(oStream); DocumentHelper.writeDocument(document, writer); .... The OutputStreamWriter assumes a default encoding, and does not respect the encoding of the source. But that's just a guess, actually I'm not sure whether it's a reading or a writing problem. -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
