[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571237#comment-15571237
 ] 

Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 8:43 AM:
-------------------------------------------------------------------------

The stackoverflow's thread you mentioned in the second message here, describes 
the problem quite well:
this character encoding was introduces in XML 1.1: 
https://www.w3.org/TR/xml11/#sec-xml11
and a possible solution is: setting the correct header: {code}<?xml 
version="1.1"?>{code}
I'm afraid, it would take ages to get this fixed by MS.

P.S. the correct XML prologue wont help with emojis, but at least it would 
solve the issue with our "record separator" :)

To be honest, I'm not sure what we could do here, I'm not a fan of workarounds. 
We could leave it as it is now, but could you probably change the "bad 
character" warnings to WARN level? Currently they are shown in DEBUG only, 
which could be misleading in a production environment.
Thanks!


was (Author: kavdeev):
The stackoverflow's thread you mentioned in the second message here, describes 
the problem quite well:
this character encoding was introduces in XML 1.1: 
https://www.w3.org/TR/xml11/#sec-xml11
and the solution is: setting the correct header: {code}<?xml 
version="1.1"?>{code}
I'm afraid, it would take ages to get this fixed by MS.

> Invalid XML character causing job to abort
> ------------------------------------------
>
>                 Key: CONNECTORS-1325
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: SharePoint connector
>    Affects Versions: ManifoldCF 2.3
>            Reporter: Phil
>            Assignee: Karl Wright
>            Priority: Blocker
>             Fix For: ManifoldCF 2.5
>
>         Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, 
> CONNECTORS-1325.patch, mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "&#xD83D" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "&#xD83D" is an invalid XML character.
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
>         at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:286)
>         at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
>         at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "&#xD83D" is an invalid XML character.
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
>         ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to