Julien Massiera created CONNECTORS-1655:
-------------------------------------------

             Summary: Web connector - UnsupportedEncodingException utf-8
                 Key: CONNECTORS-1655
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
             Project: ManifoldCF
          Issue Type: Bug
          Components: Web connector
    Affects Versions: ManifoldCF 2.17
            Reporter: Julien Massiera


When crawling some sites (for instance this one: 
[http://www.antibes-juanlespins.com/] ) the job manages to index some 
documents, but the stops with the following error code:
Error: IO error: utf-8; filename=rseventspro_rss20_56.xml

Here is one the MCF stacktrace: 
Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
filename=rseventspro_rss20_56.xml
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
 ~[?:?]
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
Caused by: java.io.UnsupportedEncodingException: utf-8; 
filename=rseventspro_rss20_56.xml
at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
~[?:1.8.0_212]
at java.io.InputStreamReader.<init>(InputStreamReader.java:100) ~[?:1.8.0_212]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
 ~[?:?]
... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to