cache.jsp does not recognize encoding conversion from content different to UTF-8
--------------------------------------------------------------------------------
Key: NUTCH-946
URL: https://issues.apache.org/jira/browse/NUTCH-946
Project: Nutch
Issue Type: Bug
Components: web gui
Affects Versions: 1.2
Environment: Server version: Apache Tomcat/6.0.29
Server built: July 19 2010 1458
Server number: 6.0.0.29
OS Name: Linux
OS Version: 2.6.18-128.7.1.el5
Architecture: i386
JVM Version: 1.6.0_22-b04
JVM Vendor: Sun Microsystems Inc.
Reporter: Enrique Berlanga
Priority: Minor
Cache view does not recognize encoding conversion needed to show properly page
content stored in a segment.
The problem is that it searchs "CharEncodingForConversion" meta in content
metadata, but it's stored in parse metadata.
Here is the patch I've generated for the fixed version:
### Eclipse Workspace Patch 1.0
#P branch-1.2
Index: src/web/jsp/cached.jsp
===================================================================
--- src/web/jsp/cached.jsp (revision 1027060)
+++ src/web/jsp/cached.jsp (working copy)
@@ -39,17 +39,18 @@
ResourceBundle.getBundle("org.nutch.jsp.cached", request.getLocale())
.getLocale().getLanguage();
- Metadata metaData = bean.getParseData(details).getContentMeta();
+ Metadata contentMetaData = bean.getParseData(details).getContentMeta();
+ Metadata parseMetaData = bean.getParseData(details).getParseMeta();
String content = null;
- String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
+ String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE);
if (contentType.startsWith("text/html")) {
// FIXME : it's better to emit the original 'byte' sequence
// with 'charset' set to the value of 'CharEncoding',
// but I don't know how to emit 'byte sequence' in JSP.
// out.getOutputStream().write(bean.getContent(details)) may work,
// but I'm not sure.
- String encoding = (String) metaData.get("CharEncodingForConversion");
+ String encoding = (String) parseMetaData.get("CharEncodingForConversion");
if (encoding != null) {
try {
content = new String(bean.getContent(details), encoding);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.