[ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568408#comment-15568408 ]
Karl Wright commented on CONNECTORS-1325: ----------------------------------------- Hi Konstantin, Good release practices, and Apache policy, says we cannot and will not re-issue releases to include patches. In order to do that there would need to be a point release instead. This change will go out as part of the 2.6 release in December. I would like to further understand how exactly this entity is presenting into the XML. If you can obtain the actual XML document (redact sensitive content, of course, but preserve formatting etc), I would greatly appreciate it. If it turns out that the problem is with the xerces parser, I can create a ticket against that. I suspect, however, that a ticket really should be created against SharePoint, although I also suspect they will be completely unwilling to fix a deprecated feature like this. > Invalid XML character causing job to abort > ------------------------------------------ > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector > Affects Versions: ManifoldCF 2.3 > Reporter: Phil > Assignee: Karl Wright > Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, > CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently > the job not being able to finish. > It would be good to have the crawler log this error, but not throw an > exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - > Exception tossed: XML parsing error: Character reference "�" is an > invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: > Character reference "�" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:286) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; > Character reference "�" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)