Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte) --------------------------------------------------------------------------------------
Key: NUTCH-625 URL: https://issues.apache.org/jira/browse/NUTCH-625 Project: Nutch Issue Type: Bug Reporter: Vinci If the crawl db contains both utf-8 non-ascii character and non-utf-8 non-ascii character(i.e. multi-byte character), the dumped content will have garbled character appear in all of the non-utf8 non-ascii text, and those texts are unable to repair by encoding reload. At the same time, the utf-8 text is normal, only the non-utf8 text broken. Any possible solution available for repairing the broken text? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.